Open nvzqz opened 8 months ago
Hi, this feature can be really fruitful for a project of mine. But, I am also interested comparing more than 2 versions of of the same function. Just putting some ideas out there, is there possibility of generalizing paired benchmarks to N-way benchmarks, where N is the number of versions of the function to compare?
is there possibility of generalizing paired benchmarks to N-way benchmarks
It is possible to build N-1 pairs measurements if it is solves your problem. Which leads me to the following:
But, I am also interested comparing more than 2 versions of of the same function.
It's an interesting use case. Can you elaborate more on the task your are trying to solve?
My case: 1 rust version (= my implementation) + 2 C versions (with different algorithms) of a performance critical function. I am also measuring the scaling behaviour of the performance based on the input size.
In the case of N-1 pair it seems OK, but it also means running the inner versions twice as much.
Let N_0, N_1, ..., N_n
be different implementations of the same function, here the inner versions would be N_1,... N_{n-1}
I'm convinced that currently the best way to implement this today would be to abuse async
/await
to get coroutines so that we can swap out the user's stack between benchmarks. This could allow for Bencher::bench
to not require a 'static
lifetime, whereas boxing and storing Box<dyn Fn() -> impl Future>
would require a 'static
bound on the closure. I think this is feasible but unfortunately requires rewriting core components spanning from the Divan
entry point through the Bencher
s passed to each benchmark.
But, I am also interested comparing more than 2 versions of of the same function.
I don't see why we would be limited to 2 benchmarks. We could theoretically interleave an arbitrary number of benchmark runs.
Sorry for the late reply. After several months of experience with different approaches in tango, I think I have more or less a complete approach to paired testing.
The way it works is the following. Each tango benchmark executable is a mixed object (binary/shared library). The binary part contains the tango benchmark runner, and the shared library part (FFI) contains the benchmark registry. Therefore runner can load benchmarks from self as well as any other tango binary. This way several benchmarks could be run simultaneously.
This IMO solves a lot of different problems:
Paired benchmarking spreads measurement noise across benchmarks. It is used in Tango.
@bazhenov, I would love to collaborate on how this approach would look like in Divan. 🙂