hyperspy / hyperspy

Multidimensional data analysis
https://hyperspy.org
GNU General Public License v3.0
508 stars 208 forks source link

Benchmarking hyperspy #2469

Open thomasaarholt opened 4 years ago

thomasaarholt commented 4 years ago

Following some discussions in #1480, I raised the thought that it would be good to benchmark certain processes in hyperspy, so that we can track whether future PRs speed up (hurrah!) or slow down (boo!) hyperspy functionality.

@tjof2 suggested https://pytest-benchmark.readthedocs.io/en/stable/index.html, which I think would be a nice addition.

I just want to record here which features might be good to benchmark as part of CI. The size of the benchmarked process (typically closely related to signal shape) should be large enough that the "interesting" part of the feature should take the most time, not the "setting up". Perhaps aim for a benchmark that takes no more than a second on a "normal" computer?

Features to benchmark

Feel free to come with more suggestions.

I suggest that the first step is to get benchmarking working nicely, and once that is stabilised we can start comparing to previous benchmarks to be aware of any slowdowns or speedups.

ericpre commented 4 years ago

Can you elaborate on how it would run, before considering what would be benchmarked?

thomasaarholt commented 4 years ago

Sure!

pytest-benchmark is a plugin to pytest. It automatically benchmarks any test that has the argument benchmark in its function. To benchmark a process, we first write a function that does that process, and then a second benchmarking function that benchmarks it. I’ve included an example at the end.

To add it to hyperspy, we add entries for pytest-benchmark to conda_environment_dev.yml and setup.py, like for pytest-mpl. I’m not sure if it needs adding anywhere else. Then we can either add benchmarks directly into the other tests, or create a “benchmarks” directory containing the benchmark tests at one of the following locations:

If we want to benchmark import times, we need to use the hyperspy_root/benchmarks directory (see my issue at ionelmc/pytest-benchmark#177, with some possible alternatives in this post - using hyperspy/src/), since if the benchmark tests lie below hyperspy_root/hyperspy, pytest automatically discovers and imports hyperspy. Then hyperspy is already imported, and cannot be properly benchmarked.

Comparing benchmarks

With the --benchmark-save=some-name option, a json file is saved at current_dir/.benchmarks/OS_Python_version/some_name.json. It would be nice to save this for every minor version (and maybe a “master” version, that is continuously overwritten with each merged PR - but that would quickly accumulate if saved in the git history!). Each test contributes about 1.4kB of json, or 10kB per test for the seven systems we test on. We don’t want to unnecessarily clog the github, so perhaps we can think of a smart way to store these.

Examples

Here is a real-life test in hyperspy/hyperspy/tests/axes/test_axes_manager.py. All benchmarks (regardless of being in multiple files) are accumulated and printed together at the end of the test run.

def axes_iteration(s):
    for i in s.axes_manager:
        pass

def test_bench_axes_1000(benchmark):
    "Test iterating through 1000 indices"
    s = Signal1D(zeros((10, 10, 10, 1)))
    benchmark.pedantic(axes_iteration, args = (s,), rounds=5, iterations=5)

This results in the following output:

Benchmark ```python pytest test_axes_manager.py ================================================================================ test session starts ================================================================================ platform win32 -- Python 3.8.1, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: C:\Users\thomasaar\hyperspy, configfile: setup.cfg plugins: benchmark-3.2.3, html-2.1.1, metadata-1.10.0 collected 18 items test_axes_manager.py .................. [100%] ------------------------------------------------- benchmark: 1 tests ------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ---------------------------------------------------------------------------------------------------------------------- test_bench_axes_1000 179.9258 199.8367 189.1168 7.9596 191.3189 11.8696 2;0 5.2877 5 5 ---------------------------------------------------------------------------------------------------------------------- Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean ================================================================================ 18 passed in 5.66s ================================================================================= ```

To compare with a previous benchmark, one calls pytest --benchmark-compare=former-benchmark-name, and it prints an output similar to the following:

Comparison ```python ------------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_bench_axes_1000 (NOW) 111.2606 (1.0) 127.7398 (1.02) 116.4413 (1.0) 6.4672 (1.80) 114.2255 (1.0) 4.6775 (1.05) 1;1 8.5880 (1.0) 5 5 test_bench_axes_1000 (0002_dc3b82d) 115.0556 (1.03) 124.8423 (1.0) 119.9552 (1.03) 3.5873 (1.0) 120.0614 (1.05) 4.4503 (1.0) 2;0 8.3364 (0.97) 5 5 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean ================================================================================ 18 passed in 3.54s ================================================================================= ```

Depending on the outcome of the discussion in ionelmc/pytest-benchmark#177, I may suggest that we keep all benchmarks in hyperspy_root/benchmarks/ so that we don't end up with two sets of benchmarks (import benchmarks and function benchmarks), and instead keep them all in one directory.

I'm going to look for some other repos that use pytest-benchmark, and see how they implement the benchmarking comparison.

What do you think about this?

ericpre commented 4 years ago

Maybe, I was not specific enough in previous question: what system is going to run the benchmark and where are the benchmark result are going to be stored? How the results will be pulled for comparison? The example you are giving are running locally.

Have you looked at https://asv.readthedocs.io?

thomasaarholt commented 4 years ago

I appreciate you being specific. I started this issue to record features that would be nice to benchmark (so I wouldn't forget them) while I read up on how it could be implemented. As you have probably realised, I am new to Azure DevOps and still trying to understand what is possible to do with it.

Thanks for the suggestion about ASV. It looks really interesting and if I get it working I will suggest an implementation.