Onboard to bencher, to start tracking benchmarks over time

joshka commented 1 month ago

Problem

Rather than having to run benchmarks ad hoc, It would be nice to see results of our benchmarks over time (particularly in CI) so we can more easily catch any regressions. https://bencher.dev/ seems to be a reasonable product for this with a free hosted tier or open source projects.

Solution

Onboard to bencher.dev

Alternatives

Do our own thing and carry the maintenance burden of that.

Additional context

Anticipating the obvious argument against this that CI is noisy, is covered in their docs: even a noisy signal easily shows when performance has drastically changed (see https://bencher.dev/docs/explanation/continuous-benchmarking/ for more info)

1000 words:

EdJoPaTo commented 1 month ago

Keep in mind that benchmarks are likely biased to the specific problem added in them and might not reflect user usage. It might miss something user code uses that might give a false sense on the benchmark. This should not stop us from refactoring benchmarks. Having useful benchmarks is way more important than having historic benchmark comparison graphs for longer.

Also, the benchmarks are highly dependent on their environment like target triple. x64 has other benefits compared to different ARM platforms. Apple silicon for example has some instructions explicitly for having easier to run code written for x64 which is why Rosetta works explicitly well on Apple ARM. The Windows ARM comparability layer does not have this which is why it’s way slower. While it’s both ARM it’s hard to compare them because of stuff like that.

As with all benchmarks absolute numbers are only useful on the given target and can not be easily compared. Changes on the other hand can be compared. (Examples: M2 Performance Core with M2 Performance Core is comparable with absolute Numbers. M2 Perf to Efficiency core is not. M2 to random benchmark platform target is not. M2 to Raspberry Pi is not. The change percentages however are comparable at least for the target triple & glibc version.)

So this will only ever give a rough idea and changing a benchmark code will result in a new graph. This should only ever be a hint at „there might be something of“.

joshka commented 1 month ago

Changes on the other hand can be compared

Yes. This is the entire point of bencher.

orhun commented 1 month ago

Looks cool! I'm curious about what do we need to move forward with it? Should we reach out to them or can we just integrate it somehow?

joshka commented 1 month ago

Looks cool! I'm curious about what do we need to move forward with it? Should we reach out to them or can we just integrate it somehow?

Read the docs, sign up, do the tasks required.

orhun commented 1 month ago

Doesn't seem free btw

Edit: Oops, there is also a free plan, but they made the text so frickin small that I couldn't see it.

I will try packaging the bencher CLI and do some tests.

orhun commented 4 weeks ago

I created a dummy project and uploaded some benchmark results for testing:

bencher run --project "orhun-s-project" --token $BENCHER_API_TOKEN --adapter rust_criterion "cargo bench"

Here is how barchart render benchmark looks like (not very exciting because just a single data point for now):

share URL

I think it's worth experimenting with different options.

orhun commented 4 weeks ago

Okay with 2 data points it is better:

url

epompeii commented 3 weeks ago

If there is anything that is a blocker for you all, please let me know!

orhun commented 2 weeks ago

Will resume work on this prrety soon.

ratatui-org / ratatui