🧪 Unmeasure coverage in tests expected to fail

webknjaz commented 3 months ago

These tests are known to only be executed partially or not at all. So we always get incomplete, missing, and sometimes flaky, coverage in the test functions that are expected to fail.

This change updates the coverage.py config to prevent said tests from influencing the coverage level measurement.

webknjaz commented 3 months ago

FTR this is motivated by my attempt to look into improving the Codecov config. While checking it out, I noticed that there's uncovered lines in tests, meaning that some tests might never run in CI, and we'd never notice. It shouldn't be like that: https://nedbatchelder.com/blog/202008/you_should_include_your_tests_in_coverage.html.

I hope, this will help us start requiring 100% coverage on tests, having followed the best practices as outlined in https://adamj.eu/tech/2019/04/30/getting-a-django-application-to-100-percent-coverage/. This won't affect non-testing code for now but seemed like a good place to start.

bluetech commented 3 months ago

I like the idea, however I fear it might over-ignore some stuff. If we don't have too many xfails, I'd feel better with just manual pragma: no cover comments...

webknjaz commented 3 months ago

@bluetech yeah, I had a feeling that there might not be enough buy-in. However, I believe this will not over-ignore stuff. Yes, there's some coverage on the first lines of some xfail tests, but it's not really useful. Over time, more or less lines may start failing there, and it'll influence PR-reported coverage changes here and there for no reason. It's even worse if the tests are very flaky.

It might make sense to measure coverage when working on fixing said xfail tests, but in that situation, one could just as well comment out or remove the mark while debugging.

I gave it enough thought over the years to think that there's no case where having coverage collected on any xfailing test is useful. Do you have any example of where this would over-ignore tests? I intentionally wrote the regex in a way that would work with explicit xfail mark decorators applied to all the parametrized variants and not to the cases where this mark is applied to only some parametrize factors. This will also leave out module-level pytestmark, which I think is fair.

bluetech commented 3 months ago

For a project which is using pytest, I agree there is little risk. But here we are not only using pytest, we are also developing pytest and testing pytest, and my "over-ignoring" concern is for the latter two. Probably it's not a real issue but it feels like something that might start ignoring some unintended piece of code without us noticing.

webknjaz commented 3 months ago

without us noticing.

That's the concern I'm trying to minimize here — with random unrelated coverage information changes, people are trained to ignore it altogether and never look into it. Just look into the effect of Codecov having a reputation of being flaky — it's been broken on main for 4.5 months (#11921) until I noticed and addressed it in #12508+#12516 just 4 days ago. And I only knew to check for it due to my previous experience and watching these things closely (it was so bad in the day of the v4 release that I even issued an advisory recommending avoiding upgrades in @aio-libs — https://github.com/orgs/aio-libs/discussions/36) — most people either don't know or learned to ignore whatever codecov shows them.

And I think that this is a low-effort improvement that would enhance the experience greatly. I just can't imagine a situation where we'd want to have xfailing tests non-ignored. Of course, for the lines that we know get always executed, it'd be nice to keep the coverage but then, we'd have to add a no cover pragma to tens or hundreds of lines since coverage.py doesn't have a concept of ignoring coverage starting with a certain line and until the end of the function.

Do you have any suggestions on handling things like https://app.codecov.io/gh/pytest-dev/pytest/blob/main/testing%2Ftest_debugging.py#L388?

bluetech commented 3 months ago

@webknjaz OK, I trust you judgment on this. Thanks for improving the coverage situation!

bluetech commented 3 months ago

I hope, this will help us start requiring 100% coverage on tests

Just a heads up from me on this -- while I think reaching 100% coverage is a laudable goal, I think that enforcing 100% is somewhat harmful. Sometimes you just want to do something without doing it perfectly...

webknjaz commented 3 months ago

I think that enforcing 100% is somewhat harmful

I don't think that enforcing it is harmful, as long as you use # pragma: no cover for all the places you consciously want to skip from being taken into account. Especially, for the tests.

webknjaz commented 3 months ago

@nicoddemus @RonnyPfannschmidt @The-Compiler do you want to post your opinions here before this is merged?

patchback[bot] commented 3 months ago

Backport to 8.2.x: 💚 backport PR created

✅ Backport PR branch: patchback/backports/8.2.x/1a8394ed8964a43e2fe766df3a48fa0573362512/pr-12531

Backported as https://github.com/pytest-dev/pytest/pull/12551

🤖 @patchback I'm built with octomachinery and my source is open — https://github.com/sanitizers/patchback-github-app.

webknjaz commented 3 months ago

Thanks for the feedback, everyone!

pytest-dev / pytest

🧪 Unmeasure coverage in tests expected to fail #12531

Backport to 8.2.x: 💚 backport PR created