Open zackw opened 1 month ago
One thing you might want to do, is to simply generate a differential report between the two. (There are obviously no source differences - so any differences in numbers are down to the tools.) You probably want to turn off function coverage as that will somewhat pollute the data given that py2lcov will report it but coverage.py wont (so will look like purely additions or deletions, depending on the direction of compare .... entirely predictable, and completely unhelpful).
The differential report will call out all the inconsistencies.
I'm very confused that you keep saying py2lcov will report function coverage data when for me it definitely does not and I don't see how it possibly can given that coverage xml
doesn't either.
Anyway, here's the result of genhtml --baseline-file B.lcov --diff-file /dev/null --ignore-errors empty -o ABdiff --flat --no-function-coverage A.lcov
. There's definitely something weird going on here; just on a quick glance, several of the GIC and GUC lines are the beginnings of docstrings, which should be treated as lines that generated no code.
Adding --branch-coverage to the invocation gives me
Reading tracefile A.lcov.
genhtml: ERROR: (corrupt) unable to read trace file 'A.lcov':
genhtml: ERROR: (inconsistent) "A.lcov":67: unexpected line number '0' in .info file record 'BRDA:0,0,0,1'
which is the aforementioned nedbat/coveragepy#1846 issue.
For the first question: if the tool can find the source file, then it parses the file to find function decls (...making some assumptions about indentation rules). It then generates a function coverage record, claiming that the function is hit as many times as the first line in the function was.
If your version of py2lcov
does not contain that support - which version are you using, and does it look like this one?
Note that the tool assumes that the file it found is the one referred in the input data. Generally true - but maybe not, if there is some delay between test execution and data extraction, and if people are not so careful with revision management.
In a larger environment: this is also why it is a good idea to use the --version-script
feature to tag the coverage data such that subsequent users can do some validity checking. (Long back, we had some very strange validation results that took some time to figure out. Never again.)
For the second: if we have GIC and UIC lines, then 'A.lcov' is claiming that that there are line coverpoints in those locations. From your description: that seems not correct. Not sure why that happens (would need to look at testcase).
For the third: you can tell the tool to --ignore inconsistent
- and then it will skip that record. (There may be subsequent issues that also need to be ignored, and it is possible that you then hit a fatal error that cannot be ignored.
Took a while (sorry) but I finally looked at the ABDiff data.
It appears to me that the GIC line are all docstrings (or the first line of the docstring if the string extends across multiple lines). I believe that this is because Coverage.py has marked those as lines of code (with hit 1 time, typically).
Adding --branch-coverage to the invocation gives me
Forgot to mention: as the message indicates, you can add --ignore-errors inconsistent
to get past the error check (turn it into a warning) - if you want to continue your experiment, to compare branch coverage data between the tools.
Not sure the current status of this issue - but I think that the recently added coverage data consistency checks should help to debug any lingering Coverage.py issues. Similarly, it might be useful to compare the Coverage.py-generated LCOV format output to the py2lcov-generated output. There may be bugs in either - or both.
I have been looking into the possibility of using lcov's genhtml to generate a custom coverage dashboard for https://github.com/MillionConcepts/pdr. This project is written in Python, so I'm experimenting with several different ways of converting coverage.py data to lcov; so far nothing has come out quite the way I want it. I think most of the problems I am having are bugs and lacunae in coverage.py, not lcov (see https://github.com/nedbat/coveragepy/issues/1846) but I think some of the analysis I've produced may be useful to the lcov project anyway.
To reproduce the raw coverage.py database whose SQL dump I have attached, do the following:
Having done the above, I then generated an lcov-format coverage report two different ways:
coverage lcov
(A.lcov in the attached diff)py2lcov
(equivalent tocoverage xml
followed byxml2lcov
AFAICT) (B.lcov in the attached diff)and normalized both for comparison purposes as follows:
I believe that all remaining differences in the output indicate a bug in something. I'm pretty sure the radical differences in BRDA records are the aforementioned https://github.com/nedbat/coveragepy/issues/1846 and related, but I'm not sure what's up with the DA record differences.
Please note the absence of function coverage records in B.lcov, contra https://github.com/linux-test-project/lcov/issues/317#issuecomment-2334203032.
coverage+linemax.sql.gz lcov-gen-comparison.diff.gz