geninfo: Error mismatch

fletcher97 commented 1 year ago

I started getting this error and I'm not sure why...

geninfo: Warning: ('mismatch') mismatched exception tag for id 5, 5: '0' -> '1'

Command executed:

// command used
lcov -c -b . -d . -o report.info --no-external --rc lcov_branch_coverage=1 --filter branch,function
|
V
// Actual command that gives out the error
/usr/bin/geninfo . --output-filename report.info --base-directory . --filter branch,function --no-external --rc lcov_branch_coverage=1 --parallel 1 --memory 0 --branch-coverage

I more or less found the code that is causing this. The issue is the following loop:

for (std::list<flt::ITestable*>::iterator it = this->_tests.begin(); it != this->_tests.end(); it++) {
    ...
}

Trying to execute lcov even when there is nothing inside the loop causes that error to pop up. When i comment the for the error goes away. This only happens when I have branch coverage enabled.

The weird part is that the for is not on the main.cpp but the mismatch occurs on main.gcda.... Even weirder is that this error occurs even if the code is unreachable...

I'm at a loss and have no idea what could be wrong.... I know I can just use --ignore-errors mismatch but I'd rather fix the issue. Not sure if by using that flag some information could be discarded or if I get other errors in the future they could be hidden by the flag and those could discard coverage information...

henry2cox commented 1 year ago

As you figured out: the message is telling you that lcov is confused, because it found some branch data for that particular line which says "this branch is related to an exception" and some other data that says "this branch is not related to an exception". Clearly: both statements cannot be true...and lcov doesn't know what to do.

There are only a few (less than optimal) things you can do about it:

compile your code to not use exceptions (depending on your application, this may not be possible)
add --rc no_exception_branch=1 to your command line. This tells lcov to drop any branches that the compiler (and gcov) marked as 'exception related' ... so then you won't hit the error because one of the pieces of branch data will have been dropped.
add an exclusion directive to this particular line ...it++) { // LCOV_EXCL_EXCEPTION_BR_LINE (or exclude all branches on that line via LCOV_EXCL_BR_LINE)
I don't want to tell you about a really ugly hack to refactor your code to enable lcov to notice that there are no conditionals. So I won't mention it.
sort through gcc/gcov and/or llvm and see if the the exception branch data can be made consistent ;-) (yeah - not holding my breath; if it was easy, it would have already been done).
OR: just tell lcov to ignore the error - and move on with your life.

Sorry to be less than helpful.

Henry

fletcher97 commented 1 year ago

This might be a stupid question but I can run gcov fine and it can parse everything with no problem... I understand the issue of having a branch marked as both excep and no-except. What I don't get is how can gcov parse it with no errors but lcov can't?

henry2cox commented 1 year ago

I think that the stupid questions are the ones that don't get asked - and thus lead to long term misunderstanding and/or a lot of later debugging and rework.

I think that the key point that you missed is the lcov is really just calling gcov under the hood. The actual flow is:

compiler (gcc or llvm, say) is passed extra flags, and writes compile time coverage data to a file called .gcno. You get one such file for each compilation unit.
At runtime (in an atexit callback), the coverage runtime arranges to write a file called .gcda which contains runtime coverage data (counter values and what have you). You get one such file for each compilation unit that is linked into your exe. ('atexit' is important in the sense that if your executable crashes or something bad happens and atexit is not called: then you get no data.)
we then run gcov to combine the data in the .gcno and .gcda files to produce something we like (in this case: a machine readable JSON file). That data contains both compile time and runtime data: locations of the various coverpoints, lists of branches associated with a particular expression, annotations to say whether the branch is related to an exception or not ... and so forth.

The error message that lcov is giving you is saying that it parsed the gcov output data - and found that the data was inconsistent. What actually happens is that data for a particular source file can appear in multiple .gcno/.gcda files - and we want to combine that into a single report/single number - but we find some inconsistency when we try.

(LLVM supports the above model as well as a similar model that uses different file formats. The basic lcov idea is the same, though. Unfortunately, llvm is not bug free either - and is also not entirely consistent.)

We use coverage data to drive the verification/validation process - so it is extremely important that the data be correct and reliable. Escapes are just WAY too costly (monetary as well as reputational - not to mention stressful). As a result: we try to check everything - but also to leave escape hatches ("sign off") so that errors can be ignored (once we decide that the tools are wrong and the chip is OK). Your priorities and your development process might be different.

fletcher97 commented 1 year ago

I understand better the flow now but understand less why it wouldn't work.....

When I run gcov I can get all the coverage info about the code in .gcov files. I guess you are not using them directly but using an intermediate machine readable json file instead as you said above. Since gcov managed to create those .gcov files with the same info (.gcno, .gcda) why can't lcov? Is it okay to assume that gcov doesn't take into consideration mismatches and ignores them silently? Or it doesn't join multiple files while lcov does and thus doesn't have the same issue? Or is it the json generated from the note and data files that has incorrect information?

I want to test this a bit further. Adding verbose and debug flags don't give much info on what lcov is doing.... Is there an easy way to get more debug info or keep the temporary files lcov generates? How can I find the exact command lcov uses to invoke gcov? Is there a way for lcov to tell what's the exact line in the code that has this conflict?

Finally, when I use the ignore-error flag, what does lcov do exactly? From what I get the error appears because the info about one branch says it comes both from an exception and not from an exception. Does it merge both even though they say they are different? Does it split them into different branches and reports them individually? Is the mismatched branch info dropped?

henry2cox commented 1 year ago

I guess you are not using them directly but using an intermediate machine readable json file instead

Not quite. When passed the -i flag, gcov produces an 'intermediate format' result file in JSON format (...after some GCC version. Slightly earlier versions produce a different test intermediate format, and versions earlier than that don't support the '-i' flag).

Is it okay to assume that gcov doesn't take into consideration mismatches and ignores them silently?

Yes. I believe that this is what happens (but I have not checked the gcov implementation to be certain what it does. We observe that it doesn't care about inconsistent branch marks, though.)

Is there an easy way to get more debug info or keep the temporary files lcov generates

perl -d geninfo ... will run under the perl debugger - which gives pretty much infinite control but not always easy/requires some knowledge of the implementation.

geninfo --preserve ... (or lcov --capture --preserve ...) will save the temporary files. You may also want to specify --tempdir somePath so you can control where they get written (and don't see weird generated names in /tmp).

How can I find the exact command lcov uses to invoke gcov?

geninfo --debug ... (or lcov --capture --debug ...) will print the gcov tool command. Look for output lines of the form "call gcov: ....."

Is there a way for lcov to tell what's the exact line in the code that has this conflict?

Yes (I added that to my sandbox but haven't pushed it yet). Note that this mightn't be completely helpful because it can only tell you which source file and line it sees the issue. This part of the code doesn't know the names of the gcno/.gcda files where the mismatch was detected, nor does it know the names of the files where the original data was generated.

Finally, when I use the ignore-error flag, what does lcov do exactly?

Right now, when you ignore the error, then we just merge the count data and ignore the 'is_exception' flag. The resulting data will have the flag value from whichever dataset was seen first. I will change that to always remove the flag. As is, the result will be unpredictable, especially when using multiple threads.

You are also correct that a Better Idea (tm) might be to keep the data sets separate (not merge) when they appear to be conflicting. I will look into how hard it is to do that.

Henry

xaizek commented 1 year ago

but I have not checked the gcov implementation to be certain what it does

https://github.com/gcc-mirror/gcc/blob/0f3b4d38d4bad8994150fe7a1e5428055d29a4bf/gcc/gcov.cc#L2382-L2409 and https://github.com/gcc-mirror/gcc/blob/0f3b4d38d4bad8994150fe7a1e5428055d29a4bf/gcc/gcov.cc#L2705-L2730

In other words, gcov starts with marking all branches as exceptional and then removes marks from some. When combining the data it only marks lines as unexceptional. So faced with conflicting reports, "unexceptional" wins.

The code doesn't have comments explaining the behaviour, but I can speculate that when optimizations remove code, too many blocks stay marked as exceptional and preferring unexceptional in reports might be an intentional correction for that. No idea if that's correct or not :)

henry2cox commented 1 year ago

pushed 1c16cc36b45a - which prints source code location information of the inconsistency.

linux-test-project / lcov

geninfo: Error mismatch #209