Differences between `coverage lcov` and `py2lcov` for consideration

zackw commented 1 month ago

I have been looking into the possibility of using lcov's genhtml to generate a custom coverage dashboard for https://github.com/MillionConcepts/pdr. This project is written in Python, so I'm experimenting with several different ways of converting coverage.py data to lcov; so far nothing has come out quite the way I want it. I think most of the problems I am having are bugs and lacunae in coverage.py, not lcov (see https://github.com/nedbat/coveragepy/issues/1846) but I think some of the analysis I've produced may be useful to the lcov project anyway.

To reproduce the raw coverage.py database whose SQL dump I have attached, do the following:

git clone https://github.com/MillionConcepts/pdr
git switch 092675e442438ddc39d4bb9e9d0d5f4c1e317f29
python3 -m venv .venv
. .venv/bin/activate
pip install pytest-cov
pip install -e '.[browsify,fits,pvl,tiff]'
sed -i -e '/formats/d; /pvl_utils/d' .coveragerc
pytest --cov --cov-branch --cov-report= --import-mode=importlib
(echo max_lineno,path
 find pdr -name '*.py' -exec wc -l '{}' + | sed '/total/d; s/^ *//; s/ /,/g'
) > max-lineno.csv
sqlite3 .coverage \
'.import --csv --schema temp max-lineno.csv max_lineno
alter table file add column max_lineno integer;
update file set path = replace(path, '\'"$PWD"/\'', '\'\'');
update file set max_lineno = ml.max_lineno from temp.max_lineno as ml
  where ml.path = file.path;'

Having done the above, I then generated an lcov-format coverage report two different ways:

coverage lcov (A.lcov in the attached diff)
py2lcov (equivalent to coverage xml followed by xml2lcov AFAICT) (B.lcov in the attached diff)

and normalized both for comparison purposes as follows:

all TN: lines were stripped
checksums were removed from all DA: lines
records were sorted by SF: pathname
within each record, the sequence of sub-record types was canonicalized to DA, LF, LH, BRDA, BRF, BRH
DA and BRDA lines were sorted numerically by line number
vacuous BRF:0 BRH:0 and LF:0 LH:0 pairs were removed

I believe that all remaining differences in the output indicate a bug in something. I'm pretty sure the radical differences in BRDA records are the aforementioned https://github.com/nedbat/coveragepy/issues/1846 and related, but I'm not sure what's up with the DA record differences.

Please note the absence of function coverage records in B.lcov, contra https://github.com/linux-test-project/lcov/issues/317#issuecomment-2334203032.

coverage+linemax.sql.gz lcov-gen-comparison.diff.gz

henry2cox commented 1 month ago

One thing you might want to do, is to simply generate a differential report between the two. (There are obviously no source differences - so any differences in numbers are down to the tools.) You probably want to turn off function coverage as that will somewhat pollute the data given that py2lcov will report it but coverage.py wont (so will look like purely additions or deletions, depending on the direction of compare .... entirely predictable, and completely unhelpful).

The differential report will call out all the inconsistencies.

zackw commented 1 month ago

I'm very confused that you keep saying py2lcov will report function coverage data when for me it definitely does not and I don't see how it possibly can given that coverage xml doesn't either.

Anyway, here's the result of genhtml --baseline-file B.lcov --diff-file /dev/null --ignore-errors empty -o ABdiff --flat --no-function-coverage A.lcov. There's definitely something weird going on here; just on a quick glance, several of the GIC and GUC lines are the beginnings of docstrings, which should be treated as lines that generated no code.

ABdiff.tar.gz

Adding --branch-coverage to the invocation gives me

Reading tracefile A.lcov.
genhtml: ERROR: (corrupt) unable to read trace file 'A.lcov':
genhtml: ERROR: (inconsistent) "A.lcov":67: unexpected line number '0' in .info file record 'BRDA:0,0,0,1'

which is the aforementioned nedbat/coveragepy#1846 issue.

henry2cox commented 1 month ago

For the first question: if the tool can find the source file, then it parses the file to find function decls (...making some assumptions about indentation rules). It then generates a function coverage record, claiming that the function is hit as many times as the first line in the function was. If your version of py2lcov does not contain that support - which version are you using, and does it look like this one? Note that the tool assumes that the file it found is the one referred in the input data. Generally true - but maybe not, if there is some delay between test execution and data extraction, and if people are not so careful with revision management. In a larger environment: this is also why it is a good idea to use the --version-script feature to tag the coverage data such that subsequent users can do some validity checking. (Long back, we had some very strange validation results that took some time to figure out. Never again.)

For the second: if we have GIC and UIC lines, then 'A.lcov' is claiming that that there are line coverpoints in those locations. From your description: that seems not correct. Not sure why that happens (would need to look at testcase).

For the third: you can tell the tool to --ignore inconsistent - and then it will skip that record. (There may be subsequent issues that also need to be ignored, and it is possible that you then hit a fatal error that cannot be ignored.

henry2cox commented 1 month ago

Took a while (sorry) but I finally looked at the ABDiff data.

It appears to me that the GIC line are all docstrings (or the first line of the docstring if the string extends across multiple lines). I believe that this is because Coverage.py has marked those as lines of code (with hit 1 time, typically).
- the ones I would be more concerned about are the GBC and LBC lines - as those are places that both 'capture' tools agree that there is code on the corresponding line, but disagree about whehter the line is exercised or not. If both tools were given the same initial data (ie., not two different test suites, not two different execution traces with different random seeds, etc.) - then there is a bug somewhere.

henry2cox commented 1 month ago

Adding --branch-coverage to the invocation gives me

Forgot to mention: as the message indicates, you can add --ignore-errors inconsistent to get past the error check (turn it into a warning) - if you want to continue your experiment, to compare branch coverage data between the tools.

henry2cox commented 2 weeks ago

Not sure the current status of this issue - but I think that the recently added coverage data consistency checks should help to debug any lingering Coverage.py issues. Similarly, it might be useful to compare the Coverage.py-generated LCOV format output to the py2lcov-generated output. There may be bugs in either - or both.

linux-test-project / lcov

Differences between `coverage lcov` and `py2lcov` for consideration #318