Closed max-sixty closed 7 years ago
Yes, some sort of automated benchmarking could be valuable, especially for noticing and fixing regressions. I've done occasional benchmarks before to optimize bottlenecks (e.g., class constructors) but it's all been ad-hoc stuff with %timeit
in IPython.
ASV seems like a pretty sane way to do this. pytest-benchmark can trigger test failures if performance goes below some set level but I suspect performance is too subjective and stochastic to be reliable.
Yes ASV is good. I'm surprised there isn't something you can ask to just "robustly time these tests", so it can bolt on without writing new code. Although maybe the overlap between test code and benchmark code isn't as great as I imagine
One issue is that unit tests are often not good benchmarks. Ideal unit tests are as fast as possible, whereas ideal benchmarks should be run on more typical inputs, which may be much slower.
Another 👍 for benchmarking. Especially as we start to get deep into integrating dask.distributed, having robust performance benchmarks will be very useful. One challenge is where to deploy the benchmarks. TravisCI might not be ideal, since performance can vary depending on competition from other virtual machines on the same system.
We would also benefit from this specifically for #1198 :+1:
Is anyone interested in working on this with me over the next few months? Given the number of issues we've been seeing, I'd like to see this come together this summer. I think ASV is the natural starting point.
I am very interested. I have been doing a lot of benchmarking already wrt dask.distributed on my local cluster, focusing on performance with multi-terabyte datasets. At this scale, certain operations emerge as performance bottlenecks (e.g. index alignment of multi-file netcdf datasets, #1385).
I think this should probably be done in AWS or Google Cloud. That way we can establish a consistent test environment for benchmarking. I might be able to pay for that (especially if our proposal gets funded)!
@rabernat - great. I've setup a ASV project and am in the process of teaching myself how that all works. I'm just playing with some simple arithmatic benchmarks for now but, of course, most of our interested will be in the i/o and dask arenas.
I'm wondering if @mrocklin has seen ASV used with any dask projects. We'll just need to make sure we choose the appropriate timer when profiling dask functions.
@TomAugspurger has done some ASV work with Dask itself
Because xarray is all python and generally not doing much compute itself (i.e. it marshals other libraries to do that), this hasn't been that important.
IIRC most of the performance issues have arisen where xarray builds on (arguably) shaky foundations, like
PeriodIndex
.Though as we mature, is it worth adding some benchmarks?
If so, what's a good way to do this? Pandas uses asv successfully. I don't have experience with https://github.com/ionelmc/pytest-benchmark but that could be a lower cost way of getting started. Any others?