PERF: Add benchmarking?

pydata / xarray

N-D labeled arrays and datasets in Python

https://xarray.dev

Apache License 2.0

3.62k stars 1.08k forks source link

PERF: Add benchmarking? #1257

Closed max-sixty closed 7 years ago

max-sixty commented 7 years ago

Because xarray is all python and generally not doing much compute itself (i.e. it marshals other libraries to do that), this hasn't been that important.

IIRC most of the performance issues have arisen where xarray builds on (arguably) shaky foundations, like PeriodIndex.

Though as we mature, is it worth adding some benchmarks?

If so, what's a good way to do this? Pandas uses asv successfully. I don't have experience with https://github.com/ionelmc/pytest-benchmark but that could be a lower cost way of getting started. Any others?

shoyer commented 7 years ago

Yes, some sort of automated benchmarking could be valuable, especially for noticing and fixing regressions. I've done occasional benchmarks before to optimize bottlenecks (e.g., class constructors) but it's all been ad-hoc stuff with %timeit in IPython.

ASV seems like a pretty sane way to do this. pytest-benchmark can trigger test failures if performance goes below some set level but I suspect performance is too subjective and stochastic to be reliable.

max-sixty commented 7 years ago

Yes ASV is good. I'm surprised there isn't something you can ask to just "robustly time these tests", so it can bolt on without writing new code. Although maybe the overlap between test code and benchmark code isn't as great as I imagine

shoyer commented 7 years ago

One issue is that unit tests are often not good benchmarks. Ideal unit tests are as fast as possible, whereas ideal benchmarks should be run on more typical inputs, which may be much slower.

rabernat commented 7 years ago

Another 👍 for benchmarking. Especially as we start to get deep into integrating dask.distributed, having robust performance benchmarks will be very useful. One challenge is where to deploy the benchmarks. TravisCI might not be ideal, since performance can vary depending on competition from other virtual machines on the same system.

pwolfram commented 7 years ago

We would also benefit from this specifically for #1198 :+1:

jhamman commented 7 years ago

Is anyone interested in working on this with me over the next few months? Given the number of issues we've been seeing, I'd like to see this come together this summer. I think ASV is the natural starting point.

rabernat commented 7 years ago

I am very interested. I have been doing a lot of benchmarking already wrt dask.distributed on my local cluster, focusing on performance with multi-terabyte datasets. At this scale, certain operations emerge as performance bottlenecks (e.g. index alignment of multi-file netcdf datasets, #1385).

I think this should probably be done in AWS or Google Cloud. That way we can establish a consistent test environment for benchmarking. I might be able to pay for that (especially if our proposal gets funded)!

jhamman commented 7 years ago

@rabernat - great. I've setup a ASV project and am in the process of teaching myself how that all works. I'm just playing with some simple arithmatic benchmarks for now but, of course, most of our interested will be in the i/o and dask arenas.

I'm wondering if @mrocklin has seen ASV used with any dask projects. We'll just need to make sure we choose the appropriate timer when profiling dask functions.

mrocklin commented 7 years ago

@TomAugspurger has done some ASV work with Dask itself