Closed jbrockmendel closed 4 years ago
Hi @jbrockmendel, this in particular is probably a performance bug that was fixed here: 4a42afdc2f3bb21f1685ac5c49b806ae34c36355
If you're able to confirm this fixes the problem by trying with pytest master, that'd be great. We can also consider backporting it to 5.4.x as it might be a little more time until 6.0.0 is released with that fix.
Thanks for the tip. Running on master:
pytest pandas/tests --collect-only
is roughly cut in half (35 seconds)pytest pandas/tests --lf
still collects everything even if just running 2 testspytest_collection_modifyitems
, (now located in pytest.fixtures
) totals .003 seconds internally, 2.293 cumulative._pytest.mark.structures.__getitem__
at 2.064 seconds for >4M calls_pytest.python.__init__
at 1.02 seconds for 144k calls.Is there anything we can do on our end to keep this from growing with the test suite? What I have in mind here is antipatterns, likely fixture-related, that we should watch out for.
the time for pytest pandas/tests --collect-only is roughly cut in half (35 seconds)
OK, most of it is likely from the fix, but some other optimizations might have contributed as well.
pytest pandas/tests --lf still collects everything even if just running 2 tests
Yep, it works that way.
Sorting by "tottime", nothing stands out:
We have been working on reducing the overhead, but all the easy optimizations have been made, so now it's progressively harder. I'd be interested to see the full cProfile (sorted by cumtime), on a non-synthetic large test suite like pandas.
Is there anything we can do on our end to keep this from growing with the test suite?
I don't have a good answer for that, maybe others can chime in here.
You can try to disable plugins you don't use, although the costly ones are usually the most useful ones.
I'd be interested to see the full cProfile (sorted by cumtime), on a non-synthetic large test suite like pandas.
The pandas tests suite clocks in at 90292 tests, which for me takes 67 seconds to run
pytest pandas/tests --collect-only
. This becomes troublesome when I want to re-run just a few failed tests with--lf
, and doesn't appear amenable to parallelization.When profiling with cProfile, it looks like just over half the time is spent in cacheprovider.pytest_collection_modifyitems
Are there suggested patterns for profiling and/or optimizing the test collection?