Replace Codecov for coverage reporting

bhrutledge commented 4 years ago

As noted in https://github.com/pypa/twine/pull/651#issuecomment-640866750, Codecov is currently having issues with GitHub Actions (see https://github.com/codecov/codecov-action/issues/37), which seems to make it unusable. This is not the first issue we've had with Codecov. Is there another service and/or GitHub action we could use?

sigmavirus24 commented 4 years ago

There have been other services but Codecov has been the best of the worst for a while. We'd have to go through the various services and try them out.

What other issues have we had with Codecov?

bhrutledge commented 4 years ago

What other issues have we had with Codecov?

Without taking the time to back this up with details, I've seen it be flaky with reports in the past, and personally, I find it a bit confusing to read and configure, with spurious failing checks.

A search of the GitHub Marketplace might be a good place to start for alternatives. I've been curious about Codacy and Code Climate.

sigmavirus24 commented 4 years ago

I've had horrible experiences with Code Climate in the past and it appears that we'd need to pay for it now. Perhaps if we had good justification for using it, we could convince the PSF to pay for it, but I'm not sure it will justify the costs. Based on Codacy's docs, I don't think they measure code-coverage and if they do, they don't list coverage.py as supported. That's worriesome but not a blocker.

di commented 4 years ago

I'm not really a fan of any of the external coverage services. Codecov specifically has felt very flakey (even before issues like codecov/codecov-action#37) and I've also found it confusing to interpret.

If we really care about maintaining code coverage, we could just run coverage.py in CI and assert on some coverage percentage to pass CI. Or if we don't care that much, we could just print the report, which is basically equivalent to what we're using CodeCov for now. (This is also what I advised that we do in #651)

bhrutledge commented 4 years ago

But...what about the badge?! 😉

Seriously, though...

While I realize a high coverage number doesn't indicate the quality of that coverage, I do think there's value in maintaining (and increasing) it, as a quick indication of testing practices, and as an initial indication of how well a PR is tested.

Case in point: https://github.com/pypa/twine/pull/652#discussion_r438010724

bhrutledge commented 4 years ago

Re: Codacy, a quick search through their docs yields support for coverage.py. However, it's also a general-purpose code-quality tool, indicating things like security and code style issues. Perhaps more than we want to adopt.

di commented 4 years ago

While I realize a high coverage number doesn't indicate the quality of that coverage, I do think there's value in maintaining (and increasing) it, as a quick indication of testing practices, and as an initial indication of how well a PR is tested.

Right, which is why I'm saying if we care about this, we should pick a % and start asserting on it in CI.

sigmavirus24 commented 4 years ago

Right, which is why I'm saying if we care about this, we should pick a % and start asserting on it in CI.

Right. This is the old school way that I never understood why anyone deviated from. pytest-cov has --cov-fail-under and I think coverage has its own options.

Further Codacy's docs (not their readme) doesn't list coverage.py. I see support for Bandit, Prospector (which last I heard was abandoned), Pylint, and metrics which I have never heard of before. No mention of coverage.py. Something's out of date somewhere which means setting up Codacy will be harder than just adding a flag.

Also, badges can (and have been) used to track people across the web - I'm surprised they've survived this long because they frankly never seem accurate thanks to a host of problems.

bhrutledge commented 4 years ago

if we care about this, we should pick a % and start asserting on it in CI.

This is the old school way that I never understood why anyone deviated from.

I like this in theory (maybe coupled with publishing the HTML report for easy review), but on its own I don't think it accomplishes the goal of ensuring that PR's maintain coverage. For example:

We set the coverage threshold to 93%
Alice submits a PR w/o tests: CI fails because coverage drops to 92%
I merge a PR that adds tests to increase coverage to 95%, but doesn't bump the threshold
Bob submits a PR w/ insufficient testing: CI passes because coverage only drops to 94%
I merge Bob's PR w/o noticing the missing tests, make a release, and Bob's feature breaks for users

To avoid that, I think we'd need to manually bump the threshold. I believe that's one of the features of Codecov and other services.

Or, we could just get to 100% coverage. 😉

di commented 4 years ago

I don't think that's too bad. We're currently at 93% so I don't imagine us having to deal with this very much.

Also, speaking from experience, coverage is not a panacea and we can still ship broken features with 100% test coverage 😉

To avoid that, I think we'd need to manually bump the threshold. I believe that's one of the features of Codecov and other services.

Assuming they a) are working b) support our toolset. 😉

sigmavirus24 commented 4 years ago

Also, speaking from experience, coverage is not a panacea and we can still ship broken features with 100% test coverage wink

I completely agree. 100% code coverage often means overly specific tests that break when a refactor is necessary. It then blocks the refactor as the person doing that work has to figure out if a test is actually protecting something or just there to hit the arbitrary goal of 100% coverage

sigmavirus24 commented 4 years ago

maybe coupled with publishing the HTML report for easy review

This is an intriguing idea. I wonder if you'd share what you're thinking you'd do to automate publishing this somewhere

bhrutledge commented 4 years ago

Agreed re: the caveats of 100% coverage.

Re: publishing an HTML report, I wondered if GitHub Action artifacts would help. A GitHub search led me to this example, where it's downloadable as a zip file:

https://github.com/stacked-git/stgit/actions/runs/129329080

Certainly not as convenient as browsing online; that might require setting up separate static site hosting, which could accommodate multiple branches/commits. It's probably more effort than it's worth.

sigmavirus24 commented 4 years ago

It's probably more effort than it's worth.

That's a shame. Thanks for chasing it down at least. That sounded like it'd be a cool (probably less flakey) alternative to the various coverage servies.

bhrutledge commented 3 years ago

Had a chat with @nedbat about this at Boston Python Office Hours; he had a similar wish, without a solution beyond the downloadable ZIP artifacts mentioned above. For now, I'm using this issue as a place to capture research and ideas.

I found https://github.com/marketplace/actions/html-preview "which comments on your PRs with a link to preview HTML files directly in the browser." However, it looks like that just renders files in the repo using a proxy.

Maybe we could publish reports to GitHub Pages in a subdirectory, e.g. /actions/pull/728/htmlcov or /actions/9ffb61790e27c6d8a5d4f17c5ae060818049da35/htmlcov. See https://github.com/marketplace/actions/github-pages-action and https://github.com/marketplace/actions/deploy-to-github-pages.

Might also be worth taking a closer look at https://github.com/marketplace?type=actions&query=coverage to find prior art.

Ned suggested that, in addition the coverage report, that the output of https://github.com/Bachmann1234/diff_cover would be handy. That got me thinking that showing https://github.com/pytest-dev/pytest-html/ would also be a nice improvement for reviewing test failures, although it looks like https://github.com/marketplace/actions/junit-report-action will add them directly to the PR.

bhrutledge commented 2 years ago

For future reference, Hynek wrote about switching from Codecov to uploading coverage.py artifacts in How to Ditch Codecov for Python Projects.

pypa / twine

Replace Codecov for coverage reporting #658