Code coverage dropped significantly after switching to nextest

milesj commented 1 year ago

I switched to nextest from cargo test because it's 50% faster, which is great for local development. However, when I landed the change upstream, my code coverage dropped significantly. You can see the report here: https://app.codecov.io/github/moonrepo/moon/commit/d77d7bf00993a8e1b62282202bdf8b392e8d895e

I'm not sure if this is a problem with nextest, or my numbers were incorrect before, but I can't seem to "fix this". Right now I'm using grcov, but I also tried llvm-cov and my coverage dropped even more.

Right now, I may just revert this and use cargo test in CI, and nextest locally.

taiki-e commented 1 year ago

For llvm-cov, maybe related to https://github.com/rust-lang/rust/issues/91092. Could you try with NEXTEST_TEST_THREADS=1 environment variable?

sunshowers commented 1 year ago

re grcov: So I'm not sure how it works, but my suspicion is that the fact that nextest runs tests in parallel might be causing test coverage data to clobber each other, such that the last test that runs wins. It would be interesting to see if setting test-threads to 1 per taiki-e's comment (or using --test-threads 1) addresses this.

milesj commented 1 year ago

Using 1 thread seemed to bump it back up by about 9% overall, which is better than nothing. https://app.codecov.io/github/moonrepo/moon/commit/ffeaf95cd8032a1338ca6a8f5c49820cda89dd13

Thanks, I'll proceed with this for now.

taiki-e commented 1 year ago

Updated cargo-llvm-cov to set NEXTEST_TEST_THREADS=1 by default.

I also tried llvm-cov and my coverage dropped even more.

FWIW, as for the difference in coverage results between grcov and cargo-llvm-cov, I think it is related to the fact that grcov seems to count coverage of test files by default, while cargo-llvm-cov tries to avoid this as much as possible.

The reason for cargo-llvm-cov tries to avoid this as much as possible is that counting the coverage of the test code itself as part of the coverage of the test target (library or binary being tested) would result in a different coverage than the actual coverage of the test target.[^1] (in many cases the test code itself is 100% covered)

So, it is not surprising that the coverage percentages drop when switching from grcov to cargo-llvm-cov; rather, it is maybe more accurate result of the coverage of the test target.

[^1]: The approach adopted by cargo-llvm-cov is not yet perfect, although

nextest-rs / nextest

Code coverage dropped significantly after switching to nextest #652