Closed GopherJ closed 2 years ago
cargo bench
is a wrapper around functionality in rustc
. If you want to change the behavior https://github.com/rust-lang/rust is probably a better place to discuss it. On the other hand the bench
is unstable exactly because it is not robust or flexible enough. There is work in rustc
to make it more a plugin system. The recommendation at this time is to use the https://crates.io/crates/criterion .
Transferred to the rust-lang/rust repository, as that is where the libtest harness lives. Unfortunately, I don't think it is likely there will be much work done on libtest's benchmarking, as the future is currently uncertain (see #29553 and #66287). You will likely have better support for external benchmarking frameworks like criterion.
#[bench]
measures iterations per walltime interval, more or less.
So if you don't want to switch to a different benchmark crate that supports instruction counting or does more sophisticated analysis you'll have to bring your system into a state that causes less variance. I.e. shut down background tasks, disable CPU clock boosting and check for thermal throttling which often is a problem when benching on laptops.
@the8472 even with that the results can change a lot:)
At least in Vec
-related things I have been working on recently I have seen variances for a null run in the 2-10% range with two outliers around 20% (among dozens of benchmarks). But that's pure CPU/memory throughput benchmarks. If you start doing syscalls or even randomized allocations things will become noisier.
I'm going to go ahead and close this issue, as it seems to me that it's largely a consequence of the overall bench design (wall time, not instruction counts, for example) which seems unlikely to get much more sophisticated inside the standard library. And, realistically, unless you're doing software emulation of some kind, most larger benchmarks will have some amount of uncertainty, especially if they have syscalls or the like.
Describe the problem you are trying to solve
currently cargo bench isn't so stable, it doesn't run long enough, and the data can vary a lot (20-30%), which makes it hard to know if there is really a regression or not.
Describe the solution you'd like no sorry
Notes