Open kratsg opened 4 years ago
Probably also worth looking at airspeed velocity as this seems to be basically exactly what I had in mind.
This might be worth looking into if we can get an external grant to pay for us to run a small Digital Ocean or AWS instance to host this. Seems pretty valuable.
Probably also worth looking at airspeed velocity as this seems to be basically exactly what I had in mind.
NumPy and SciPy use asv
for benchmarks, so it might be worth looking at how they do it.
asv
for NumPyAn interesting thing is that asv
will go and run tests on old commits automatically so you can automatically build the performance history.
I think(?) this might be possible to do with just a repo over in the pyhf
org that runs things on a cron job.
c.f. also Is GitHub Actions suitable for running benchmarks?, where the answer is: yes.
And https://github.com/pydata/xarray/pull/5796 provides basically a template for how to do all of this!
In https://github.com/glotzerlab/signac/pull/776 @bdice mentions
We deleted the CI script for benchmarks from
signac
2.0 anyway, because it's not reliable and we want to useasv
instead.
@bdice I would love to talk to you about asv
sometime as we've been wanting to set that up for pyhf
for a while but haven't yet. If you have insights on how to get going with it I'd be quite keen to learn.
You can see signac's benchmarks defined here: https://github.com/glotzerlab/signac/blob/master/benchmarks/benchmarks.py
And the asv
config: https://github.com/glotzerlab/signac/blob/master/asv.conf.json
And here's a quick reference I wrote on how to use asv
: https://docs.signac.io/projects/core/en/latest/support.html#benchmarking
I have mixed feelings about it. It can be difficult to make asv
do what I want sometimes, and the project's development has been rather slow. Sometimes I wish for features that don't exist (like being able to have greater control over test setup/teardown to ensure that caches are cleared between runs without having to regenerate input data -- something like pytest fixtures would be helpful). I've run into a handful of situations while running asv
that felt like bugs but were difficult to trace down. I don't know of better alternatives to asv
unless you have the time and energy to roll your own Python scripts, which is what signac had done for a long time. Eventually the maintenance of those DIY scripts and their limitations were annoying enough that outsourcing to asv
felt like a good decision.
edit: I read some of the thread above. I have had really mediocre experiences with running benchmarks as a part of CI or on shared servers. Dedicated local hardware is the only way I've ever gotten metrics that I really trust, especially for a project like signac that is heavy on I/O. The results from Quansight on GitHub Actions were extremely helpful for calibrating my own experience of annoyance with CI benchmarks in the past. I don't think the metrics they see for false positives and highly noisy data are good enough for what the signac project has needed in the past -- local benchmarks are much less variable in my experience.
Hi folks, @matthewfeickert asked me to leave my 2 cents here a few days ago. Basically 2 things:
Dedicated local hardware is the only way I've ever gotten metrics that I really trust, especially for a project like signac that is heavy on I/O.
This is 100 % correct. Here are the benchmarks we ran a few years ago in poliastro: the noisy lines are my own laptop (supposedly without doing anything else), the almost straight line is a cheap, dedicated server we rented on https://www.kimsufi.com/. Slower, but infinitely more useful.
I have mixed feelings about it. It can be difficult to make asv do what I want sometimes, and the project's development has been rather slow.
Recently they got a grant https://pandas.pydata.org/community/blog/asv-pandas-grant.html and managed to revamp the CI and make a release. The project has not seen more commits since then, so I agree it's not very active, but I'm not aware of any alternatives. The closest one would be https://github.com/ionelmc/pytest-benchmark/, but it's equally inactive.
Following up on @astrojuanlu's excellent points, I was talking with @gordonwatts at the 2022 IRIS-HEP Institute Retreat about this and he mentioned that he might have some dedicated AWS machines that we could potentially use (or at least trial a demo). Gordon, if you can elaborate on this as my memory from last week isn't as clear as it was the next day.
We have an account that is connected with IRIS-HEP for benchmarking (@masonproffitt and I were going to use this for some benchmarking for our ADL Benchmark paper work, but it didn't happen). This is still active. Only Mason and I have access. But you get a dedicated machine of a certain specific size (at least, that is what the web interface says). So if one can basically build a script that does the complete install and then runs the test, this can be a cheap-ish way to run these.
Description
See the python SDK: https://github.com/honeycombio/libhoney-py
Workflow I had in mind
In general, we won't merge in PRs unless we can fix the slow stuff.
@ismith: