perf: only call summary when the report will be used

jstewmon commented 1 year ago

👋 I would like to propose a small change to the control flow which precludes generating the coverage report if it is not going to be used by the given pytest run in which it was generated (i.e. it is going to be combined with other reports later).

I didn't add any new tests yet because I don't see any observable difference from the perspective of the tests. If there's any suggestions on how to support this change with a test, I'm happy to add tests to verify the behavior.

Waaay back when #34 was merged, the control flow changed to always generate a coverage report, even if it isn't used.

On small code bases, or code bases that collect all of their coverage in one step, this is probably fine, but on large code bases, it can take minutes to generate the report.

When used in conjunction with pytest-xdist, generating the report only leverages 1 CPU, leaving the rest idle. For example: if if the coverage report takes 2 minutes to generate, pytest -n 8 may complete in 5 minute, while pytest -n 8 --cov completes in 7 minutes, wasting 15 CPU minutes.

I have a private code base that runs its test suite across dozens of CI jobs, most of which use pytest-xdist to run tests in parallel. On average, running a test job with coverage enabled adds ~80 seconds. The aggregate cost of all the wasted CPU minutes has become significant. I'd like to optimize by generating an aggregate coverage report by combining all the .coverage files after all jobs have run.

jstewmon commented 1 year ago

@ionelmc Sorry for pestering, but do you have any feedback on this PR?

To add more color to the motivation for this change - the CPU savings on our project's test workload is substantial enough that we would consider maintaining a private fork.

Considering the longevity of the current behavior, I suppose generating the report unnecessarily is not a prevalent concern. So, I recognize this may not be a priority for pytest-cov.

TIA 🙏

nedbat commented 1 year ago

Rather than a private fork, have you considered not doing any reporting or fail-under with pytest-cov at all, and instead using coverage.py directly? That will give you more control over the choreography between the phases.

jstewmon commented 1 year ago

Hi @nedbat, thanks for your reply. I initially tried running pytest under coverage, but I wasn't able to get it to collect coverage data.

In my search for a working setup, it seemed like most people who had tried to get coverage working with pytest-xdist had eventually given up and used pytest-cov instead. So, I shifted my focus to trying to figure out why pytest-cov was adding so much time to our test jobs, even when we disabled reporting, which led to this PR.

Prompted by your reply, I made a new attempt to use coverage.py directly, and I got it working! The problem ended up being a bad source entry in our coverage configuration. The errant source setting did not preclude coverage being captured by pytest-cov.

After removing the errant source setting, adding [run] parallel = True, and installing coverage_enable_subprocess, I'm getting the expected coverage data when running COVERAGE_PROCESS_START=$PWD/.coveragerc coverage run -m pytest -n 4 tests/unit.

So, thank you again for your reply! I think using coverage.py directly will work for us and is a much better option than maintaining a fork of pytest-cov just to defer reporting.

ionelmc commented 1 year ago

Will release in 4.1 later today if no objections.

pytest-dev / pytest-cov

perf: only call summary when the report will be used #589