nvzqz / divan

Fast and simple benchmarking for Rust projects
https://nikolaivazquez.com/blog/divan/
Apache License 2.0
924 stars 26 forks source link

Add statistically-significant improvement reporting #48

Open TheLostLambda opened 6 months ago

TheLostLambda commented 6 months ago

Similar to what criterion does, but I think a useful starting point would just be a ±% change in times between runs (if it's determined that the two runs differ significantly given the variance of each)!

I imagine this is somewhat blocked on writing out the previous benchmark results somewhere they can be referenced first!

nvzqz commented 3 months ago

This is a bit of a nuanced issue. Currently benchmarks don't have any statistics outside of min/max/median/mean time. But I would very much like to do proper statistical analysis across benchmark runs to determine if a difference is distinguishable from random (i.e. statistically-significant).

The way the Stabilizer folks went about it resulted in a normal distribution of results. But being an easy-to-pickup userspace program, Divan doesn't have the same luxury of being an LLVM plugin. That said, benchmarks tend to follow a log-normal distribution. So perhaps we can make the same stats work from that?


As per the intent of the issue, my plan once #10/#42 is complete is to report a ±% change from the previous run given information previously recorded in target/divan.