nanograv / PINT

PINT is not TEMPO3 -- Software for high-precision pulsar timing
Other
121 stars 101 forks source link

Slow tests #870

Open paulray opened 3 years ago

paulray commented 3 years ago

When using make test it now reports the slowest 20 tests. This is important since our CI resources are limited and it is taking a really long time for travis to run. So, we need to do some work to speed up the tests where possible.

=============================================================== slowest 20 durations ================================================================
171.51s call     tests/test_fitter.py::test_ftest_nb
70.75s call     tests/test_fitter.py::test_ftest_wb
69.66s call     tests/test_gls_fitter.py::TestGLS::test_gls_fitter
68.22s call     tests/test_gls_fitter.py::TestGLS::test_gls_compare
65.44s call     tests/test_parameters.py::TestParameters::test_START_FINISH_in_par
51.22s call     tests/test_utils.py::test_dmxparse
46.12s call     tests/test_ell1h.py::TestELL1H::test_J0613_H4
45.03s call     tests/test_event_optimize.py::TestEventOptimize::test_result
43.73s call     tests/test_ell1h.py::TestELL1H::test_J0613_STIG
34.35s call     tests/test_photonphase.py::test_result
33.35s call     tests/test_gls_fitter.py::TestGLS::test_whitening
33.27s call     tests/test_ecorr_average.py::TestEcorrAverage::test_ecorr_average
29.81s call     tests/test_fbx.py::TestFBX::test_derivative
22.55s call     tests/test_widebandTOA_fitting.py::TestWidebandTOAFitter::test_fitting_no_full_cov
22.24s call     tests/test_widebandTOA_fitting.py::TestWidebandTOAFitter::test_fitting_full_cov
22.04s setup    tests/test_ell1h.py::TestELL1H::test_J0613_H4
21.86s call     tests/test_B1855_9yrs.py::TestB1855::test_derivative
20.85s call     tests/test_wls_fitter.py::Testwls::test_wlf_fitter
20.38s setup    tests/test_ecorr_average.py::TestEcorrAverage::test_ecorr_average
18.49s setup    tests/test_fbx.py::TestFBX::test_B1953_binary_delay

@bshapiroalbert can you see about the test_ftest ones? Those are the two slowest tests. If they can be reworked to use much smaller numbers of TOAs that would be great.

bshapiroalbert commented 3 years ago

Hey @paulray, I will see if I can make those a bit faster by the end of this week, sorry they are so slow. I think I just grabbed a par and tim file that were already in accessible in the PINT repo and didn't look too much into the speed or number of TOAs.

aarchiba commented 3 years ago

I filed a bug #869 simultaneously with a load of suggestions for speeding up tests.

aarchiba commented 3 years ago

I can't speak for any of these specifically but many of our tests load a full .tim file when five TOAs in a string would do, and would be clearer because the TOAs would be visible where the test is. Maybe I should see if I can supply better tools for this kind of thing?

aarchiba commented 3 years ago

Astropy has a mechanism where you can flag tests as "slow" and then they won't be run unless requested. Perhaps we could establish a workflow of "grab a slow test, write a fast one that you think tests the same thing, then flag the original one as slow and keep it around just in case"?

kerrm commented 3 years ago

I think the slow flag is a good idea. A perfect example is "test_event_optimize", which is now the third slowest call. It runs a MCMC, but there's really no way to avoid this is you want to test an MCMC piece of code. But it's reasonable not to run it every time.

Practically I don't know how this would work with CI, though. Perhaps the slow tests get invoked manually, once a PR has "settled" and there are no new commits? And then we have locally a "make test" and a "make slow-test" or something. (Or maybe "make test" should run everything and "make fast-test" skips the slowies.)

paulray commented 3 years ago

That sounds like a great idea. Right now certain tests are skipped if the environment doesn't support them, so this could probably be easy to implement with an environment variable setting or something. RUN_SLOW_TESTS=1

aarchiba commented 3 years ago

It is given as an example in the pytest documentation here: https://docs.pytest.org/en/stable/example/simple.html

You'd run it as pytest -m slow, and we could set this up as appropriate in the Makefile and/or CI.

Astropy was, when we were using Travis, set up to run the slow tests no more than once a day, as a cron job. There are options.

aarchiba commented 3 years ago

Additional suggestions from #869 (so I can close it):

Currently the PINT test suite takes many minutes to run. While it is possible to run individual test files with pytest tests/test_thing.py and individual tests with pytest tests/test_thing.py::test_thing_works, running all the tests takes so long it disrupts development and discourages running the whole test suite.

There are tools for improving this:

871 provides infrastructure and demonstrates how to use an ultra-simplified par file in a triple-quoted string.

kerrm commented 3 years ago

RE fixtures: I think a huge amount of test time is being spent setting up a fit and testing something with it. (e.g., dmxparse). This could certainly be improved in many cases by using fake TOAs rather than large datasets. But grouping those tests where possible and having them use the same fit object would also make a huge improvement. Could this be done without losing clarity, i.e. by keeping the tests grouped into sensible files?

aarchiba commented 3 years ago

It can be done, with some care. The recommended way to handle this sort of thing is to create a function annotated as a fixture:

@pytest.fixture
def fitter_with_stuff():
    m, t = get_model_and_toas(...)
    f = WLSFitter(t, m)
    f.fit_toas()
    return f

Then this is used by individual tests by using its name as an argument:

def test_stuff(fitter_with_stuff):
    assert fitter_with_stuff.stuff = "specific"

If you do this this way, fitter_with_stuff will be re-run for each test that uses it. If you use @pytest.fixture(scope="module") the fixture will be run only once for the whole module. (Expect pain and suffering if any test modifies the returned object.)

kerrm commented 3 years ago

That would definitely be a step in the right direction!

I agree RE pain and suffering about modifying the object. What about pickling the Fitter after f.fit_toas, and the fixture returns "fitter_pickle_file" for the test to then load and unpickle? And perhaps some support functions to handle that loading and unpickling for specific flavors of fitter: load_cached_b1821_fitter, load_cached_j0030_wideband_fitter, etc.

On Wed, 3 Feb 2021 at 13:52, Anne Archibald notifications@github.com wrote:

It can be done, with some care. The recommended way to handle this sort of thing is to create a function annotated as a fixture:

@pytest.fixturedef fitter_with_stuff(): m, t = get_model_and_toas(...) f = WLSFitter(t, m) f.fit_toas() return f

Then this is used by individual tests by using its name as an argument:

def test_stuff(fitter_with_stuff): assert fitter_with_stuff.stuff = "specific"

If you do this this way, fitter_with_stuff will be re-run for each test that uses it. If you use @pytest.fixture(scope="module") the fixture will be run only once for the whole module. (Expect pain and suffering if any test modifies the returned object.)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nanograv/PINT/issues/870#issuecomment-772730285, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3LAG5UUI7OI5HYPMA2VJLS5GK5LANCNFSM4UT4XGMA .

aarchiba commented 3 years ago

You can run the test suite as:

$ python -m cProfile -o profile $(which pytest)

and then the program snakeviz (pip install snakeviz) allows you to visualize the profiling results - you can figure out where PINT is spending all its time.