sandialabs / pyGSTi

A python implementation of Gate Set Tomography
http://www.pygsti.info
Apache License 2.0
132 stars 55 forks source link

Faster unit tests/notebooks #380

Closed sserita closed 3 months ago

sserita commented 7 months ago

Is your feature request related to a problem? Please describe. The unit and notebook regression tests take too long.

Describe the solution you'd like Faster unit tests. This could be achieved by using different inputs or options which don't affect coverage but do affect runtime. Simple examples for GST related tests include using lower germ lengths, smaller gate sets, etc. This was already done for the test/test_packages/algorithms tests in #372, but we should do this across the whole codebase.

Task list Below are tasklists made by running pytest with --durations 0 and taking the most expensive tests.

Everything in test/unit around 5s or above:

__Everything in test/test_packages around/over a minute__:

__Everything in jupyter_notebooks around/over a minute__:

coreyostrove commented 7 months ago

Thanks for compiling this, @sserita! Regarding the contents of test packages, I anticipate a few of these being addressed as part of #373. The changes on that branch broke some of the unit tests in that module, so I decided that since I was in the code already fixing them to also streamline some of them at the same time.

sserita commented 7 months ago

Great! Whenever #373 is ready, I'll go through and check off whichever ones you've already gotten to.

coreyostrove commented 7 months ago

Question regarding the runtimes above. What system and pyGSTi version were these timings collected on? I.e. was this done locally on your system (presumably an apple silicon macbook) or on the github runners? I ask because I noticed some of the runtimes were significantly longer than I had recalled seeing on my system (windows laptop with xeon W-10855M processor) following the updates in #372.

For example, looking at the second test in the test/unit (test_gaugeopt.py::LGSTGaugeOptSPAMPenaltyTester::test_gaugeopt_no_target) this takes just 8.14s on my system, so ~16x faster (it is indeed still one of the longer tests, so that is still consistent). The M-series macbooks are pretty spiffy, so while I wouldn't expect exactly the same behavior (I actually would've expected it to be faster than my system), 16x seems a bit larger than I would have expected.

I am running this on the bugfix-stricter-line-label-enforcement branch, but as far as I recall haven't made any changes to the tests in question since #372.

sserita commented 7 months ago

The test/unit was run on my MacBook Pro, the other two were run on Dynobird. I did run them with the --dist loadscope flag, so that could be a potential difference in the timings if you are just running them serially but with more cores per test?

Edit: I just ran this without --dist loadscope and got 4.6s for that test instead. Maybe I will rerun these without distributed flag for more consistent timings, and update the tasks.

coreyostrove commented 7 months ago

I was running these on a single thread so didn't have any dist flags set. But we do use that flag on the runners (and probably should based on what I understand about it) so maybe I should? If I recall correctly I think I had previously decided that flag is probably more relevant when doing multi-threaded notebook regressions than the standard unit tests. The notebook regression tests really need to have all of the cells in a notebook run on a single worker, whereas typically the standard unit tests should be generally more robust in this regard.

Edit: I just re-ran the tests on my machine with multiple threads and the --dist loadscope flag set and it did not meaningfully change the timings. Maybe there is something weird going on with pytest xdist on macs?

sserita commented 7 months ago

Yea agreed something seems wonky - or maybe I was running something else at the same time in the background. Timings are updated - do those seem more reasonable across the board? This was no --dist loadscope and on Dynobird now.

coreyostrove commented 7 months ago

I just spot checked a few of the updated timings and these seem much more reasonable.

coreyostrove commented 7 months ago

One thing that might be worth looking into in the course of trying to speed up the testing process is our current list of dependencies installed using the testing flag. A good chunk of the overall time is setting up the runners and installing packages, so removing any unneeded dependencies would be helpful here.

coreyostrove commented 5 months ago

With #373 merged in I re-ran the unit tests to get updated timings and see what bottlenecks were still unresolved. Note that I cheated a bit and re-ran these on songthrush (dynobird and quail were in use), so these aren't a perfect 1-for-1 comparison, but should hopefully be close enough. Ran tests on a single thread (though some of the linear algebra stuff was clearly utilizing openMP based on htop).

Everything in test/unit around 5s or above:

Everything in test/test_packages around/over 5s (note previous list was over a minute)

Everything in jupyter_notebooks around/over 45 seconds:

I have started taking a crack at some of these on a new branch called 'feature-faster-testing-stuff', and will check things off as I make progress.

sserita commented 3 months ago

This is mostly completed with the merge of #403.