linux-test-project / lcov

LCOV
GNU General Public License v2.0
866 stars 234 forks source link

Support cobertura format as input #274

Closed dilyanpalauzov closed 2 months ago

dilyanpalauzov commented 3 months ago

A software product is compiled on different systems with different compilers. E.g. this source code

#ifdef WIN32
  printf("WIN32\n");
#else
  printf("NOT WIN32\n");
#endif

is compiled on Linux with gcc or clang and on Windows with MSVC 2019. I prefer not to dig into how to compile on Windows with clang. There are tests performed on the compiled code, which produce code coverage reports. Tests run only on Windows, only on Linux or on both systems. I want to see the source code in html files, showing which lines were ever executed during the tests on Linux or on Windows. On Windows, the possible formats produced by MSVC are mentioned at https://devblogs.microsoft.com/visualstudio/code-coverage-features-in-visual-studio-enterprise/#support-of-additional-report-formats : CoverageXML (file extension .coveragexml), XML (.xml), Cobertura (.cobertura.xml), Binary (.coverage).

Please extend lcov to accept as input any of the four coverage formats produced by MSVC. This way merged coverage reports can be generated for code executed under Windows and under Linux.

henry2cox commented 3 months ago

I took a very quick look at the coverage-*.dtd doc (seems to be no official spec?), then took a quick look at the existing py2lcov translator (part of the lcov package - see .../scripts/py2lcov and also py2lcov --help.) On first glance (and totally untested), it appears that this might be close to what you want.

Some things to note:

Presuming that this is close to what you are looking for, some refactoring is likely necessary to make it work in a reasonable fashion.

henry2cox commented 3 months ago

Implemented. Will push the fix/enhancement after some testing. Perhaps next week some time.

You may or may not need it, but you can generate a differential report to see the code exercised by A and not B, B but not A, and neither. We sometimes turn up bugs when something is hit that should not be, or vice versa.

henry2cox commented 3 months ago

Clarifying the above comment:

This is a problem in at least 2 ways

The above issue and interpretation is applied to any coverage data that arrives in lcov via XML import. Since py2lcov uses XML import internally: it applies to Python code. Neither gcov or llvm-profdata nor Perl Devel::Cover import have this issue (though they may have other issues).

dilyanpalauzov commented 3 months ago

the XML coverage data format does not contain enough information to deduce exactly which branch expressions have been taken or not taken.

cl.exe can generate three XML coverage formats, per above hyperlink, it is not clear which one you mean.

Does the Cobertura format contain this information?

dilyanpalauzov commented 3 months ago

The above issue and interpretation is applied to any coverage data that arrives in lcov via XML import.

I do not understand this text about any XML format. Either the needed information is in the input files, or it is not there. How does the choice of XML format create problems with missing data, even the data is avalable in the input.

henry2cox commented 3 months ago

cl.exe can generate three XML coverage formats, per above hyperlink, it is not clear which one you mean.

I don't know about the cl.exe data formats - feel free to post or email me a set of representative examples, and I can check.

The format I tested with was found at https://gist.github.com/apetro/fcfffb8c4cdab2c1061d (~10Mb) - and claims to be the XML spec version used by Cobertura. This is very similar to the XML format generated by Coverage.py - but the data in the above link has some additional fields. Neither that Cobertura nor Coverage.py format contain enough information to resolve branch expressions.

Does the Cobertura format contain this information?

No.

henry2cox commented 3 months ago

I do not understand this text about any XML format. Either the needed information is in the input files, or it is not there. How does the choice of XML format create problems with missing data, even the data is avalable in the input.

From my (limited) reading: it seems that there is some ambiguity or some discussion about exactly what the XML format for coverage data looks like. Certainly, the data produced by the Python tool is different than what appears to be produced by Cobertura (but note that the Cobertura data I looked at was for Java code...I don't know what it would have shown for Python code - nor do I know how it would have generated that Python data, if not through Coverage.py - so I tend to doubt that we would see anything new).

Thus the upshot is: no. None of the XML coverage data that I have seen contains sufficient information to identify and distinguish between branch expressions. There may be yet another XML flavor which does - but I have not seen such data and do not know of such a tool. I do know of multiple tools for other languages, which can (and do) contain such information.

henry2cox commented 3 months ago

this should be addressed in commit f18d34d45a Please give it a try, and see if it works as you expected. If so..please go ahead and close this issue. If not: please describe the problems you see - and ideally, include a testcase which illustrates the bugs.

dilyanpalauzov commented 2 months ago

I was told, that Cobertura output, created by Microsoft’s cl.exe/instrumentation utilities, does create C++ mangled names, and unmangling is not handled anywhere.

Moreover the same output contains method names, without C++ class names, so taking input from two compilers (gcc and MS/Cobertura) and mapping them one over other, does not match the function names.

I personally have no access to Microsoft software, generating coverage information. At the same time my focused moved away from test coverage, so I’m closing this.

henry2cox commented 2 months ago

I was told, that Cobertura output, created by Microsoft’s cl.exe/instrumentation utilities, does create C++ mangled names, and unmangling is not handled anywhere.

Both lcov and 'genhtml support demangling - see the --demangle section in the man pages. However, the xml2lcov translator does not support demangling. It should be possible to either read the xml2lcov output (possibly containing mangled names) into lcov (genhtml) demangle, and then write out translated .info data (a demangled HTML report).

Moreover the same output contains method names, without C++ class names, so taking input from two compilers (gcc and MS/Cobertura) and mapping them one over other, does not match the function names.

If mangling is different between different tools and you want a unified report, then you would either have to demangle the individual tool data separately and then aggregate, or write your own demangle tool which handles the different formats. The latter would be possible only if your wrapper could distinguish gcc vs MS names from context.

I personally have no access to Microsoft software, generating coverage information. At the same time my focused moved away from test coverage, so I’m closing this.

Sounds good to me. Presumably, if anyone else is interested in XML/Cobertura and finds lcov bugs, they will file a new issue.

Henry