Closed hramrach closed 3 months ago
As it turns out, yes.
Pretty sure we use substitutions in every project that uses lcov/diffcov (...which is why the option exists).
We tend not to use --no-external
though - too blunt an instrument.
You did read the man page, and found the lcov --base-directory
option, right?
Similarly, you read up on the other closely related options - right?
You also understand that nobody can debug your problems if you do not provide a testcase - right?
That's the general problem with lcov. The man page is insufficient to understand what the option does.
In this case it does talk about geninfo using the modified filenames so I suppose it should be expected that using --substitute breaks pretty much everything.
Then again it can also be used when aggregating data, which no longer runs geninfo, which works for only renaming the reported files. That is, however, not documented.
As usual: need a testcase which exhibits the issue.
We use --substitute
on every non-trivial project that I am aware of - so I'm quite confident that it works correctly for the use models we see.
~/test/base> lcov --capture --branch-coverage --base $(pwd) --no-external --directory . --output-file coverage.info
Capturing coverage data from .
geninfo cmd: '/usr/bin/geninfo . --output-filename coverage.info --base-directory /home/hramrach/test/base --no-external --memory 0 --branch-coverage'
Found gcov version: 7.5.0
Using intermediate gcov format
Writing temporary data to /tmp/geninfo_datZHXL
Scanning . for .gcda files ...
Found 1 data files in .
Processing ./a.gcda
Finished .info-file creation
~/test/base> lcov --capture --branch-coverage --base $(pwd) --substitute 's#/base##g' --no-external --directory . --output-file coverage.info
Capturing coverage data from .
geninfo cmd: '/usr/bin/geninfo . --output-filename coverage.info --base-directory /home/hramrach/test/base --no-external --substitute s#/base##g --memory 0 --branch-coverage'
Found gcov version: 7.5.0
Using intermediate gcov format
Writing temporary data to /tmp/geninfo_datRQTz
Scanning . for .gcda files ...
Found 1 data files in .
Processing ./a.gcda
Excluded data for 1 file due to include/exclude options
geninfo: WARNING: no data generated
Finished .info-file creation
~/test> lcov --capture --branch-coverage --base $(pwd) --substitute 's#/base##g' --no-external --directory base --output-file base/coverage.info
Capturing coverage data from base
geninfo cmd: '/usr/bin/geninfo base --output-filename base/coverage.info --base-directory /home/hramrach/test --no-external --substitute s#/base##g --memory 0 --branch-coverage'
Found gcov version: 7.5.0
Using intermediate gcov format
Writing temporary data to /tmp/geninfo_datQn0K
Scanning base for .gcda files ...
Found 1 data files in base
Processing base/a.gcda
geninfo: ERROR: unable to open /home/hramrach/test/a.c: No such file or directory
(use "geninfo --ignore-errors source ..." to bypass this error)
Finally, using a superfluous merge step there is no complaint:
~/test/base> lcov --capture --branch-coverage --base $(pwd) --no-external --directory . --output-file coverage0.info
~/test/base> lcov --substitute 's#/base##g' --add-tracefile coverage0.info --output coverage.info
~/test/base> lcov --list coverage.info
|Lines |Functions |Branches
Filename |Rate Num|Rate Num|Rate Num
==================================================
[/home/hramrach/test/]
a.c |50.0% 2| 0.0% 1| - 0
==================================================
hramrach@localhost:~/test/current> lcov --capture --branch-coverage --base $(pwd) --no-external --directory . --output-file coverage0.info
Capturing coverage data from .
geninfo cmd: '/usr/bin/geninfo . --output-filename coverage0.info --base-directory /home/hramrach/test/current --no-external --memory 0 --branch-coverage'
Found gcov version: 7.5.0
Using intermediate gcov format
Writing temporary data to /tmp/geninfo_datYFCb
Scanning . for .gcda files ...
Found 1 data files in .
Processing ./a.gcda
Finished .info-file creation
hramrach@localhost:~/test/current> lcov --substitute 's#/current##g' --add-tracefile coverage0.info --output coverage.info
Combining tracefiles.
.. found 1 files to aggregate.
Merging coverage0.info..0 remaining
Writing data to coverage.info
Summary coverage rate:
lines......: 100.0% (2 of 2 lines)
functions..: 100.0% (1 of 1 function)
branches...: no data found
hramrach@localhost:~/test/current> lcov --list coverage.info
|Lines |Functions |Branches
Filename |Rate Num|Rate Num|Rate Num
==================================================
[/home/hramrach/test/]
a.c |50.0% 2| 0.0% 1| - 0
==================================================
Total:|50.0% 2| 0.0% 1| - 0
As already said the --substitute
documentation does say that the option affects where sources are searched for, and it probably works great for cases when the source and the binaries are in different directories because of out-of-tree build.
What is not clear is how to do substitution when the goal is to show different file path in the report but the source and binary are in the same directory - that is not substitute in the source search path, only in the report.
I might be misunderstanding what you are trying to do.
Brief background. lcov wants to see your source code, for two primary purposes:
lcov --capture
enables 'region' filtering by default - which is why it went to look for the source code. See the --no-markers
option, if you want to disable that.Generally, the tools give you a 'source' error if they cannot find the source code.
If you don't care about the error - say, because you aren't trying to filter and you aren't trying to display the source - then you can suppress the message (see the --ignore-errors
option).
I think that this may be what you want, in one of your commands, above - but you are probably better off to just disable filtering if that is the case.
The basic purpose of the --substitute
option - as well as the --source-directory
option - is to help the tool find the source files - and, to a lesser extent, to figure out if the same file is known by multiple aliases.
Finding the sources can be problematic, as the compiler embeds some path information in the gcov data and the coverage runtime can munge them further.
If you only wanted to change the reported source file path names: then --substitute
may or may not do what you need, depending on what else you are trying to do at the same time.
If this still seems to be not working as you expect: please include the testcase source code and your compilation command.
I probably need your execution environment as well - especially if you were using GCOV_PREFIX
and/or GCOV_PREFIX_STRIP
.
The command to compile was
gcc -Wall -fprofile-arcs -ftest-coverage a.c -o a
and the source is
int main()
{
return 0;
}
This is only a test case to replicate the issue.
I use a tool other than genhtml to produce HTML, and it has assumptions about mapping of the in-report filename and the web URL that is only fulfilled when the subdirectory is stripped in the report. That's required for hyperlinks in the report to work.
The subdirectory is required because it's writing (or at least attempting) a report that compares the coverage between the two revisions 'base' and 'current'.
It certainly has problems and I am not sure it is a good idea to use this specific tool yet. Nonetheless, these requirements are not particularly exotic.
I fear that I somewhat mislead you about what is happening in your example - sorry about that. I added a bit more logging, to make what is happening a bit more clear.
The upshot is that the --no-external
flag that you passed is telling lcov to filter out files that are NOT in some set of directories.
That set is set to whatever name you gave to the (optional) --base-directory
flag, as well as all the directories you passed in lcov --directory ...
options.
The tricky bit happens next:
gcov
output and finds source file names--include
or --exclude
pattern - or because you are 'external' to the project.a.c
gets excluded because it is in the (substituted) directory \<HOME\>test
- and not in the 'internal' directory `\<HOME>/test/base'.With the additional logging, the test output looks like:
$ lcov -o foo.info --capture -d . --base `pwd` --substitute 's#/base##g' --no-external --branch --build-dire `pwd` --source `pwd`
Capturing coverage data from .
geninfo cmd: '.../lcov/bin/geninfo . --output-filename foo.info --build-directory /home/hcox/coverage/lcov/tests/lcov/base --base-directory /home/hcox/coverage/lcov/tests/lcov/base --no-external --substitute s#/base##g --branch-coverage'
Found gcov version: 10.2.0
Using intermediate gcov format
Recording 'internal' directories:
/home/hcox/coverage/lcov/tests/lcov/base
Writing temporary data to /tmp/geninfo_datLdea
Scanning . for .gcda files ...
Found 1 data files in .
using: chunkSize: 1, nchunks:1, intervalLength:0
Dropping 'external' file '/home/hcox/coverage/lcov/tests/lcov/a.c'
Finished processing 1 GCDA file
Excluded data for 1 file due to include/exclude options
geninfo: ERROR: (empty) no data generated
(use "geninfo --ignore-errors empty ..." to bypass this error)
Message summary:
1 error message:
empty: 1
In your case, the easiest approach is likely to remove --no-external
- which is somewhat of a blunt instrument, and not so easily controlled - and replace with --include
/--exclude
regexps instead.
I think that having some additional logging in this area is a Good Idea, though - as clearly it confused me as well as confusing you. I will check that in along with some other fixes and enhancements.
Henry
I use a tool other than genhtml to produce HTML, and it has assumptions about mapping of the in-report filename and the web URL that is only fulfilled when the subdirectory is stripped in the report. That's required for hyperlinks in the report to work.
I'm rather curious about that tool...what it is, what its features are. You may or may not be familiar with lcov's differential categorization and binning features. I wonder if this is similar - and what new ideas it can teach me.
I appreciate any information you can share (publicly or privately)
Henry
The tool in question is https://github.com/romeovs/lcov-reporter-action
The setup can be seen in full detail in https://github.com/hramrach/lm-sensors/pull/2
The results are not quite satisfactory, native lcov reports provide more data. There is something lost when the report is converted by the tool. It does, however, generate the correct github URLs for the data it does report.
Thanks...I took a quick look just now - and will take a deeper look after I deal with a few more burning issues.
In addition to - or instead of - looking at lcov's differential reporting, you might also want to look at the genhtml --select-script ...
feature - which we use for a rather similar code review methodology.
The upshot is that one can ask for an HTML report that shows only differences between points of interest (say, SHAs): is there new code which is not exercised? Is there unchanged code which is no longer exercised? Are there gains? And so on.
If there is no negative change (your coverage criteria is met and your other regressions pass): then the commit can be integrated automatically (no manual inspection or action required).
If there is a problem (e.g., untested modified code) - then you can either fix it or sign off.
While the full report contains the same information as the subset, it is also typically much, much larger and thus potentially more complicated to understand and to navigate. (For example: a recent review subset was about 8600 LOC in 4 directories - including both changes and context - whereas the full report is about 370K lines in many more directories. The difference is even more stark for smaller features.) OTOH: the subset contains only limited context - and so might be hard to understand at times. Both formats can be useful.
One thing ls that lcov tool itself does not mention differential coverage at all. It's only supported by genhtml which I did not need to look at because I do have a HTML generation tool.
The other thing is that the genhtml tool requires a diff file to produce the differential coverage report.
You might think that a diff would provide some insight into the code changes but the fact is that it does not. genhtml could apply arbitrary heuristic to determine why the coverage has changed in face of non-identical code between base and current revision and it pushes that heuristic on the diff tool. The diff tool cannot tell what the intent behind changes was, it only tries to apply some heuristic to determine which parts of the code are identical between the two revisions. It deals better with some kinds of modifications, and very poorly with others.
The https://github.com/romeovs/lcov-reporter-action tool does not need a diff to generate a differential report, and while this particular tool is not good at it other cloud-based tools seem to do a decent job.
Overall I think that the differential reporting functionality in lcov is needlessly complex giving rise to wealth of replacement tools.
Finally, https://github.com/romeovs/lcov-reporter-action aims at github integration which is a specialized task that genhtml does not provide.
If rather than genhtml lcov itself or other separate tool not tied to a specific HTML output provided the differential reports as part of lcov it would make it much easier to provide reports in varying formats, such as github comments.
I think we will have to agree to disagree. I'm glad that you are happy with your tool choices. It turns out that we use this one - albeit, with some additions - on a few projects.
The additional logging messages (to indicate files which have been excluded for one reason or another) can be found in f9b156155. I think that that addresses the root cause of the original issue.
If that turns out to be true: please close this issue. If not: please describe the remaining issue(s) in as much detail as possible, and append a testcase which illustrates the bug. Thanks Henry
It might be reasonable to provide a hint in the documentation that --substitute
can be also used --add-tracefile
after the geninfo
command has collected the data in the --capture
, perhaps to make use of it in a different environment where the sources are at a different place.
It would be helpful if you could suggest replacement wording - and even submit it as a PR. For the most part, my observation is that things which confuse one user, confuse others as well - so it is a Good Thing to get clarification, when that happens.
Note that the original intent of the --substitute
option (and, later, of the --resolve-script
option) was to make it possible to match file pathnames between build directory, gcc/gcov view (data found in the compiled code), and the revision control system.
Fundamentally: lcov is 100% driven by path strings: if your string name is identical then you are the same file. If you are not identical - then you are a different file. "./foo.c" and "foo.c" are not the same.
This all started because we were having some configuration issues between projects and builds, using multi-step lcov-then-sed scripts. Arguably: this is either cleaner/easier - or more complicated.
You can also see where the above lead to version checking - and so forth.
Closing this issue now. The actual cause has been identified, and the man pages updated to clarify how the --substitute
option works.
If there is still a problem, please either reopen this issue or file a new one.
Please include a testcase which illustrates the bug along with a description of how to reproduce.
Thanks
Henry
Is there any way to use this option at all?
With --no-external all files are considered external because filename after substitution is matched against base dir, and no data is recoreded.
When base dir is adjusted sources cannot be found because source after substitution is searched.