Coverage dropped when Coverage didn't change

In this commit:

https://github.com/jcubic/lips/commit/28a58e6e1f95f6a18307d96007157413a9af791c

I've added this code:

    dot = true;
} else if (dot) {
    throw new Error('Parser: syntax error more than one element after dot');
} else {

The code was tested and it's covered by tests, because I checked the JSON file generated by C8 and the lines are green. But Coveralls said that Coverage dropped by -0.02%

https://coveralls.io/builds/65153629

In the same, I deleted one line of code and Coveralls said that the Coverage dropped by 0.002%

https://coveralls.io/builds/65177310

Another example I've updated dependencies and Coverage dropped by 0.3%

https://coveralls.io/builds/65163814

It looks like reports are Random. Can you explain the dropping of Coverage?

This is my coverage status:

----------|---------|----------|---------|---------|
File      | % Stmts | % Branch | % Funcs | % Lines |
----------|---------|----------|---------|---------|
All files |   85.36 |    86.84 |   83.97 |   85.36 |
 lips.js  |   85.36 |    86.84 |   83.97 |   85.36 |
----------|---------|----------|---------|---------|

Not sure why I have 85.669%.

EDIT: Another issue is reverse, the code was added that was fully Tested and Coveralls says that the Coverage increased by 0.02% but nothing changed:

https://coveralls.io/builds/65485219

My guess (as a general dev who fights coverage on and off) is that as you've changed the numerator and denominator, the percentages will be slightly different.

Imagine you had 4 lines of code and 3 were tested (3/4). If you add one tested line of code, you would then have (4/5) which would show up as a significant change. If instead you have 990 lines of tested code out of 999 lines and add 1 line of tested code, you move from 990/999 to 991/1000 which is a change (0.000009009009009...) even if it's approximately zero (≈0).

Remember that coverage is reported as a percentage, it isn't a count of uncovered lines, so when you add a covered line you're mostly seeing the effects of changing the denominator.

@jcubic @jsoref tI will start by adding a +1 to @jsoref's comment and add that the ideal starting point for comparing coverage, and understanding the reasons for any changes in coverage, especially when those are unexpected, is to look at coverage stats, over coverage percentages.

Summary: Below I try to provide an investigative method to determine the root cause for unexplained coverage changes between builds. Please try it for specific builds you have specific questions about and then, if the reasons still aren't clear, I will be happy to help dig deeper and we'll be on the same page.

Details: At the level of a build, you can find coverage stats in the RUN DETAILS column, in the Build Info table at the top of each Build Page.

As an example, let's take your first cited build, which showed a coverage drop of -0.02%: https://coveralls.io/builds/65153629

Here are the coverage stats:

(Please ignore the collapsible window of Admin data from my view.)

In terms of changes in coverage, each build is compared with its "preceding build," or "previous build," which, in the case of push commits is the previous push commit on the same branch; and, in the case of pull_request commits, is the base commit of the PR.

In the case of this build: https://coveralls.io/builds/65153629

It's a push build, so the previous build is the push build right before it on master: https://coveralls.io/builds/65131760

Which you can get to by clicking the "Prev Build" link in the inner left nav of your Build Page:

So let's compare the changes in coverage, by comparing changes in stats between the Previous Build (Build #7560115104) (LEFT), and the Current Build (Build #7571839635) (RIGHT):

The first issue I see there is the (0.0%) percentage given for the branch coverage ratio.

You have this repo setting ENABLED:

So branch coverage should be included in your aggregate coverage calculation, and (0.0%) is definitely not correct for either of those two ratios in either of those two build's RUN DETAILS sections.

(So that is probably a bug and I have asked someone to look into it. However, unfortunately, due to the age of the build, we no longer have the original coverage report stored, so we don't have that to examine. That said, if those are the stats we got, that part of the aggregate coverage % calculation is clearly inaccurate.)

Now would be a good time to share the formula for aggregate coverage %, which is:

aggregate coverage w/ branches = (covered lines + covered branches) / (relevant lines + relevant branches)

Or, without branch coverage included:

aggregate coverage w/o branches = covered lines / relevant lines

In your case, just the branch coverage portions of those two calculation should have been:

Build #7560115104 (LEFT): 2236 of 2586 branches covered: (86.47%)

And:

Build #7571839635 (RIGHT): 2260 of 2614 branches covered: (86.46%)

Which, using the formula would have resulted in your aggregate coverage calculations being:

Build #7560115104 (LEFT):

aggregate coverage w/ branches = (covered lines + covered branches) / (relevant lines + relevant branches)
aggregate coverage w/ branches = (9464 + 2236) / (11018 + 2586)
aggregate coverage w/ branches = 11700 / 13604
aggregate coverage w/ branches = 86.004%

Which is what we got, so I suspect the (0.0%) is a display bug rather than a calculation one.

And:

Build #7571839635 (RIGHT):

aggregate coverage w/ branches = (covered lines + covered branches) / (relevant lines + relevant branches)
aggregate coverage w/ branches = (9464 + 2260) / (11021 + 2614)
aggregate coverage w/ branches = 11724 / 13635
aggregate coverage w/ branches = 85.985%

Which is also what we got for that build, so, again, I suspect the (0.0%) is a display issue rather than a calculation one.

But if you don't mind, let's leave that aside for now. Before I finish this I will try re-processing each build to see if that issue might resolve itself.

Regardless, though, I believe the aggregate coverage % calculations are accurate per the stats we received.

The next place to look to understand where those stats came from is the CHANGED TABS in your SOURCE FILES table(s).

Since we're trying to understand the changes between those two builds, let's look at the CHANGED TABS for the Current Build (Build #7571839635 (RIGHT)): https://coveralls.io/builds/65153629

According to the CHANGED tab, only one file changed, and, in terms of coverage changes, the relevant changes are the LINE MISSES and BRANCH MISSES there (7 new misses).

These changes are easier to understand in a side-by-side context as well, so after searching that file name in the SOURCE FILES table in your previous build, here's that:

I hope that's helpful as a general approach to investigating coverage changes.

Please use it as a guide and let me know if you have any open questions that remain for your other builds. We can even start with the one I used here in case anything looks off to you.

Also, if you'd like to choose some more example builds, more recent builds are better in terms of us still having their original coverage reports on hand.

I still don't understand how, by adding one branch that is covered by test, I have +3 missed and +4 branch missed.

https://github.com/LIPS-scheme/lips/commit/28a58e6e1f95f6a18307d96007157413a9af791c

Przechwycenie obrazu ekranu_2024-08-29_16-53-27

I understand the calculation part, but it looks like the information about the changes is wrong.

If you want, I can try to create a copy of my project with the above change as the last one, so you will be able to see the coverage data.

Hi @jcubic.

I understand: You only added three (3) lines to the project. Two (2) of those lines---representing a new branch---were uncovered by tests, so you wrote a test to cover those two lines (by covering the new branch).

Therefore, why +3 missed lines, and +4 missed branches?

Answers

I was also curious, so I performed some analysis and what I can say by way of response is:

The coverage reports we received for the file you changed in your commit, src/lips.js were significantly different in ways you might not expect, between CI Build 1 (Build #7560115104) and CI Build 2 (Build 7571839635). My efforts with details to support that claim are documented below.

And:

What matters in coverage reports is not limited to code (or tests) that changed in a commit. Coverage reports are generally not incremental in nature---they re-run fully and completely for each CI build (unless configured to do otherwise). So what matters is what your coverage library says was hit by tests upon your test suite's full retesting of the entirety of your code, for any given commit.

If those coverage stats change in unexpected ways (beyond the scope of your changes and the changes in coverage you expect), for whatever reason, then your test suite can be said to be non-deterministic. I'm not saying yours is, just that, based on these results, it might warrant further investigation.

Coveralls makes no judgements about or changes to the coverage reports it receives from users. File-by-file, or build-by-build, it just consumes the stats in the reports and then performs aggregate coverage calculations on them.

In the case of this one (quite large) file, src/lips.js, (with 11021 lines), which gained three (3) lines between Build #7560115104 (LEFT) and Build 7571839635 (RIGHT), I discovered that the coverage report we received for that file from your test suite was significantly different between those two builds.

The results are as follows:

Methods

Before the results, though, my methods:

I exported the JSON version of each file (here and here) and asked ChatGPT to compare both files and asked a series of questions I thought would: (A) illuminate key differences between report 1 and report 2; and (B) be verifiable in each individual report, and from comparing both side-by-side. I have pasted the answers to my questions below (which should make the questions themselves clear). After that I've pasted some screenshots showing examples of the changes referenced in the answers.
I wanted to compare each individual source file report from Coveralls to the original data for the same file found in your original coverage report, to verify no loss of, or change in, data between your original reports and our stored versions of the reports for those two versions of the file. Unfortunately, as I explained above, these two builds were sufficiently old that we no longer have their original coverage reports stored. Therefore, I'll be happy to compare two more recent builds for you so we can gain further evidence as to whether your tests are non-deterministic.

Results

Comparison of coverage reports for `src/lips.js` across two CI builds ("CI Build 1" and "CI Build 2")

Where CI Build 1 corresponds to Coveralls Build: https://coveralls.io/builds/65131760

And CI Build 2 corresponds to Coveralls Build: https://coveralls.io/builds/65153629

Note: The total changes between the files are represented across 4 lines, but they only represent 3 new lines, from Lines 1497-1499. This is relevant to some answers below.

Here’s the analysis from comparing the coverage report for src/lips.js from CI Build 1 to the coverage report for src/lips.js from CI Build 2:

Number of lines with 0 coverage (Not covered):
- CI Build 1 (7560115104-lips.js.json): 1,554 lines have 0 coverage.
- CI Build 2 (7571839635-lips.js.json): 1,557 lines have 0 coverage.
Comparison of uncovered lines:
- CI Build 2 has 3 more uncovered lines than CI Build 1.
List of "zero (0) lines":
- CI Build 1 (Example):
  - Line 118: 0
  - Line 254: 0
  - Line 255: 0
  - Line 256: 0
  - Line 257: 0
  - Line 258: 0
  - Line 260: 0
  - Line 261: 0
  - Line 313: 0
  - Line 314: 0
  - (and so on for 1,554 lines)
- CI Build 2 (Example):
  - Line 118: 0
  - Line 254: 0
  - Line 255: 0
  - Line 256: 0
  - Line 257: 0
  - Line 258: 0
  - Line 260: 0
  - Line 261: 0
  - Line 313: 0
  - Line 314: 0
  - (and so on for 1,557 lines)
Unique "zero (0) lines" in CI Build 2 (Lines 1497 and above that don't exist three lines lower in CI Build 1):
- Line 1571: 0
- Line 1625: 0
- Line 1626: 0
- Line 1627: 0
- Line 1668: 0
- Line 1669: 0
- Line 3475: 0
- Line 3476: 0
- Line 3477: 0
- Line 3604: 0
- (and a few more)
Lines that were uncovered in CI Build 1 but became covered in CI Build 2:
- Part A (Lines before 1497):
  - None (no lines before Line 1497 that were uncovered in CI Build 1 became covered in CI Build 2).
- Part B (Lines 1500 and after, adjusted for three-line shift):
  - Line 2841: 0 -> 1
  - Line 2951: 0 -> 841397
  - Line 2952: 0 -> 841397
  - Line 3943: 0 -> 14
  - Line 3944: 0 -> 24
  - Line 3945: 0 -> 24
  - Line 3946: 0 -> 24
  - Line 5444: 0 -> 19
  - Line 5445: 0 -> 19
  - Line 5446: 0 -> 19

What does it mean?

In case it's not clear from the above, it means:

There was a drop in coverage---at least line coverage---represented by three (3) new uncovered, relevant lines.
Answer (4) lists the lines that are uncovered in Build 2 that were previously covered in Build 1. There are more than three (3) such lines there, so that means that these two reports are even more different than expected vis-a-vis "three (3) new uncovered lines."
Answer (5) lists lines that were uncovered in Build 1 that became covered in Build 2, which supports the notion that there are other changes that counter-balance the "more than three (3) uncovered lines that we know about." In fact, there are ten (10) changed lines in each build that "balance each other out," even though the result is 3 more uncovered lines in Build 2.

Screenshots

New missed lines:

New covered lines:

(Some) Reasons for Unexpected Coverage Changes

In case it's helpful, these are some of the reasons we've seen for unexpected coverage changes (aka. non-deterministic tests).

A test suite is considered non-deterministic, or said to contain "flaky tests," when it produces inconsistent results across different runs, even when the code being tested hasn't changed.

This can occur for various reasons, including:

Race Conditions: Tests might depend on timing or the order in which operations are performed, leading to intermittent failures when these conditions vary.
Unstable Dependencies: Tests that rely on external services, databases, or APIs might fail if those services are temporarily unavailable or behave unpredictably.
Environment Sensitivity: Differences in the test environment, such as available memory, CPU load, or network conditions, can cause tests to pass or fail inconsistently.
Randomness: Tests that include randomness, such as random data generation, can behave differently across runs, leading to inconsistent results.
Shared State: If tests are not properly isolated and share state, one test might affect the outcome of another, leading to non-deterministic failures.
Concurrency Issues: Tests involving multiple threads or processes might experience synchronization issues, leading to inconsistent outcomes.

Source File Reports

P.S. @jcubic. Another possibility is that some tests were missing from the test suite on one or the other commit. Unfortunately, GitHub doesn't have those Actions logs anymore either.

I checked for any original coverages reports we had stored and unfortunately we don't have any. (Your last build was from Apr 2. I think we store them for 90-days.)

One option is to simply re-run one of your more recent builds. That should re-send the coverage report and generate a new build. If you get a message that the "build is closed", you can simply delete the existing build at Coveralls (see screenshot for Delete link), and re-run from CI. That should ensure an entirely new build.

lemurheavy / coveralls-public