Tracking Benchmarks - Githubissues

pygfx / pylinalg

Linear algebra utilities for Python

https://pylinalg.readthedocs.io/

MIT License

11 stars 3 forks source link

Tracking Benchmarks #29

Open FirefoxMetzger opened 1 year ago

FirefoxMetzger commented 1 year ago

Since we constantly talk about improving the performance of the routines provided by pylinalg, I was wondering if there are any plans on formally tracking benchmarks of various routines to make sure that we don't regress in terms of speed.

Korijn commented 1 year ago

I've done a bit of research on benchmarking and found it hard to come up with a good strategy. What do you propose?

ivoflipse commented 1 year ago

Have a look at https://github.com/python/pyperformance which is used for benchmarking Python itself. It allows you to create a suite of test to regularly run.

The biggest challenge comes from having a properly configured benchmark machine, as a lot of factors can contribute to different benchmark results that have nothing to do with the actual code changes. But I still think it's worthwhile to check for unintended regression that are big enough to be caught by noisy benchmarking.

As long as you use the same hardware and follow some best practices for benchmarking, I think you'll be fine. Having some numbers may be better than none at all. We can always improve the benchmark setup once the noise starts to dominate.

FirefoxMetzger commented 1 year ago

Scikit-Image also maintains a benchmark suite. At the time I was contributing it was still being finalized, so I can't say how much it is actually enforced, but it could serve as inspiration:

Docs: https://scikit-image.org/docs/stable/contribute.html#benchmarks CI: https://github.com/scikit-image/scikit-image/blob/main/.github/workflows/benchmarks.yml

Korijn commented 1 year ago

I really like the airspeed velocity tool that sk-image uses! Like Ivo indicates, to me the core issue has always been to compare results from one benchmark run to another, between machines and even the same machine over time. The concept that av brings to the table - just run the benchmark for older commits here and now on the same machine, and compare - is a pretty clever solution!

epompeii commented 1 year ago

I've been working on a continuous benchmarking tool called Bencher that supports both use cases, either tracking benchmarks over time for comparison or using relative benchmarking (similar to asv) : https://github.com/bencherdev/bencher The idea is for it to be like a pyperformance for your application code. Would that be helpful here?