Open thomasaarholt opened 4 years ago
Can you elaborate on how it would run, before considering what would be benchmarked?
Sure!
pytest-benchmark is a plugin to pytest. It automatically benchmarks any test that has the argument benchmark
in its function. To benchmark a process, we first write a function that does that process, and then a second benchmarking function that benchmarks it. I’ve included an example at the end.
To add it to hyperspy, we add entries for pytest-benchmark to conda_environment_dev.yml
and setup.py
, like for pytest-mpl
. I’m not sure if it needs adding anywhere else. Then we can either add benchmarks directly into the other tests, or create a “benchmarks” directory containing the benchmark tests at one of the following locations:
hyperspy_root/benchmarks
hyperspy_root/hyperspy/benchmarks
hyperspy_root/hyperspy/tests/benchmarks
If we want to benchmark import times, we need to use the hyperspy_root/benchmarks
directory (see my issue at ionelmc/pytest-benchmark#177, with some possible alternatives in this post - using hyperspy/src/
), since if the benchmark tests lie below hyperspy_root/hyperspy
, pytest automatically discovers and imports hyperspy. Then hyperspy is already imported, and cannot be properly benchmarked.
With the --benchmark-save=some-name
option, a json file is saved at current_dir/.benchmarks/OS_Python_version/some_name.json
. It would be nice to save this for every minor version (and maybe a “master” version, that is continuously overwritten with each merged PR - but that would quickly accumulate if saved in the git history!). Each test contributes about 1.4kB of json, or 10kB per test for the seven systems we test on. We don’t want to unnecessarily clog the github, so perhaps we can think of a smart way to store these.
Here is a real-life test in hyperspy/hyperspy/tests/axes/test_axes_manager.py
. All benchmarks (regardless of being in multiple files) are accumulated and printed together at the end of the test run.
def axes_iteration(s):
for i in s.axes_manager:
pass
def test_bench_axes_1000(benchmark):
"Test iterating through 1000 indices"
s = Signal1D(zeros((10, 10, 10, 1)))
benchmark.pedantic(axes_iteration, args = (s,), rounds=5, iterations=5)
This results in the following output:
To compare with a previous benchmark, one calls pytest --benchmark-compare=former-benchmark-name
, and it prints an output similar to the following:
Depending on the outcome of the discussion in ionelmc/pytest-benchmark#177, I may suggest that we keep all benchmarks in hyperspy_root/benchmarks/
so that we don't end up with two sets of benchmarks (import benchmarks and function benchmarks), and instead keep them all in one directory.
I'm going to look for some other repos that use pytest-benchmark, and see how they implement the benchmarking comparison.
What do you think about this?
Maybe, I was not specific enough in previous question: what system is going to run the benchmark and where are the benchmark result are going to be stored? How the results will be pulled for comparison? The example you are giving are running locally.
Have you looked at https://asv.readthedocs.io?
I appreciate you being specific. I started this issue to record features that would be nice to benchmark (so I wouldn't forget them) while I read up on how it could be implemented. As you have probably realised, I am new to Azure DevOps and still trying to understand what is possible to do with it.
Thanks for the suggestion about ASV. It looks really interesting and if I get it working I will suggest an implementation.
Following some discussions in #1480, I raised the thought that it would be good to benchmark certain processes in hyperspy, so that we can track whether future PRs speed up (hurrah!) or slow down (boo!) hyperspy functionality.
@tjof2 suggested https://pytest-benchmark.readthedocs.io/en/stable/index.html, which I think would be a nice addition.
I just want to record here which features might be good to benchmark as part of CI. The size of the benchmarked process (typically closely related to signal shape) should be large enough that the "interesting" part of the feature should take the most time, not the "setting up". Perhaps aim for a benchmark that takes no more than a second on a "normal" computer?
Features to benchmark
import hyperspy.api
import hyperspy.api_nogui
from hyperspy._signals import Signal1D
AxesManager
of a given sizem.fit
on a certain signalm.multifit
on a certain navigator size + signalFeel free to come with more suggestions.
I suggest that the first step is to get benchmarking working nicely, and once that is stabilised we can start comparing to previous benchmarks to be aware of any slowdowns or speedups.