linux-test-project / lcov

LCOV
GNU General Public License v2.0
867 stars 235 forks source link

geninfo: Error mismatch #209

Closed fletcher97 closed 1 year ago

fletcher97 commented 1 year ago

I started getting this error and I'm not sure why...

geninfo: Warning: ('mismatch') mismatched exception tag for id 5, 5: '0' -> '1'

Command executed:

// command used
lcov -c -b . -d . -o report.info --no-external --rc lcov_branch_coverage=1 --filter branch,function
|
V
// Actual command that gives out the error
/usr/bin/geninfo . --output-filename report.info --base-directory . --filter branch,function --no-external --rc lcov_branch_coverage=1 --parallel 1 --memory 0 --branch-coverage

I more or less found the code that is causing this. The issue is the following loop:

for (std::list<flt::ITestable*>::iterator it = this->_tests.begin(); it != this->_tests.end(); it++) {
    ...
}

Trying to execute lcov even when there is nothing inside the loop causes that error to pop up. When i comment the for the error goes away. This only happens when I have branch coverage enabled.

The weird part is that the for is not on the main.cpp but the mismatch occurs on main.gcda.... Even weirder is that this error occurs even if the code is unreachable...

I'm at a loss and have no idea what could be wrong.... I know I can just use --ignore-errors mismatch but I'd rather fix the issue. Not sure if by using that flag some information could be discarded or if I get other errors in the future they could be hidden by the flag and those could discard coverage information...

henry2cox commented 1 year ago

As you figured out: the message is telling you that lcov is confused, because it found some branch data for that particular line which says "this branch is related to an exception" and some other data that says "this branch is not related to an exception". Clearly: both statements cannot be true...and lcov doesn't know what to do.

There are only a few (less than optimal) things you can do about it:

Sorry to be less than helpful.

Henry

fletcher97 commented 1 year ago

This might be a stupid question but I can run gcov fine and it can parse everything with no problem... I understand the issue of having a branch marked as both excep and no-except. What I don't get is how can gcov parse it with no errors but lcov can't?

henry2cox commented 1 year ago

I think that the stupid questions are the ones that don't get asked - and thus lead to long term misunderstanding and/or a lot of later debugging and rework.

I think that the key point that you missed is the lcov is really just calling gcov under the hood. The actual flow is:

The error message that lcov is giving you is saying that it parsed the gcov output data - and found that the data was inconsistent. What actually happens is that data for a particular source file can appear in multiple .gcno/.gcda files - and we want to combine that into a single report/single number - but we find some inconsistency when we try.

(LLVM supports the above model as well as a similar model that uses different file formats. The basic lcov idea is the same, though. Unfortunately, llvm is not bug free either - and is also not entirely consistent.)

We use coverage data to drive the verification/validation process - so it is extremely important that the data be correct and reliable. Escapes are just WAY too costly (monetary as well as reputational - not to mention stressful). As a result: we try to check everything - but also to leave escape hatches ("sign off") so that errors can be ignored (once we decide that the tools are wrong and the chip is OK). Your priorities and your development process might be different.

fletcher97 commented 1 year ago

I understand better the flow now but understand less why it wouldn't work.....

When I run gcov I can get all the coverage info about the code in .gcov files. I guess you are not using them directly but using an intermediate machine readable json file instead as you said above. Since gcov managed to create those .gcov files with the same info (.gcno, .gcda) why can't lcov? Is it okay to assume that gcov doesn't take into consideration mismatches and ignores them silently? Or it doesn't join multiple files while lcov does and thus doesn't have the same issue? Or is it the json generated from the note and data files that has incorrect information?

I want to test this a bit further. Adding verbose and debug flags don't give much info on what lcov is doing.... Is there an easy way to get more debug info or keep the temporary files lcov generates? How can I find the exact command lcov uses to invoke gcov? Is there a way for lcov to tell what's the exact line in the code that has this conflict?

Finally, when I use the ignore-error flag, what does lcov do exactly? From what I get the error appears because the info about one branch says it comes both from an exception and not from an exception. Does it merge both even though they say they are different? Does it split them into different branches and reports them individually? Is the mismatched branch info dropped?

henry2cox commented 1 year ago

I guess you are not using them directly but using an intermediate machine readable json file instead

Not quite. When passed the -i flag, gcov produces an 'intermediate format' result file in JSON format (...after some GCC version. Slightly earlier versions produce a different test intermediate format, and versions earlier than that don't support the '-i' flag).

Is it okay to assume that gcov doesn't take into consideration mismatches and ignores them silently?

Yes. I believe that this is what happens (but I have not checked the gcov implementation to be certain what it does. We observe that it doesn't care about inconsistent branch marks, though.)

Is there an easy way to get more debug info or keep the temporary files lcov generates

perl -d geninfo ... will run under the perl debugger - which gives pretty much infinite control but not always easy/requires some knowledge of the implementation.

geninfo --preserve ... (or lcov --capture --preserve ...) will save the temporary files. You may also want to specify --tempdir somePath so you can control where they get written (and don't see weird generated names in /tmp).

How can I find the exact command lcov uses to invoke gcov?

geninfo --debug ... (or lcov --capture --debug ...) will print the gcov tool command. Look for output lines of the form "call gcov: ....."

Is there a way for lcov to tell what's the exact line in the code that has this conflict?

Yes (I added that to my sandbox but haven't pushed it yet). Note that this mightn't be completely helpful because it can only tell you which source file and line it sees the issue. This part of the code doesn't know the names of the gcno/.gcda files where the mismatch was detected, nor does it know the names of the files where the original data was generated.

Finally, when I use the ignore-error flag, what does lcov do exactly?

Right now, when you ignore the error, then we just merge the count data and ignore the 'is_exception' flag. The resulting data will have the flag value from whichever dataset was seen first. I will change that to always remove the flag. As is, the result will be unpredictable, especially when using multiple threads.

You are also correct that a Better Idea (tm) might be to keep the data sets separate (not merge) when they appear to be conflicting. I will look into how hard it is to do that.

Henry

xaizek commented 1 year ago

but I have not checked the gcov implementation to be certain what it does

https://github.com/gcc-mirror/gcc/blob/0f3b4d38d4bad8994150fe7a1e5428055d29a4bf/gcc/gcov.cc#L2382-L2409 and https://github.com/gcc-mirror/gcc/blob/0f3b4d38d4bad8994150fe7a1e5428055d29a4bf/gcc/gcov.cc#L2705-L2730

In other words, gcov starts with marking all branches as exceptional and then removes marks from some. When combining the data it only marks lines as unexceptional. So faced with conflicting reports, "unexceptional" wins.

The code doesn't have comments explaining the behaviour, but I can speculate that when optimizations remove code, too many blocks stay marked as exceptional and preferring unexceptional in reports might be an intentional correction for that. No idea if that's correct or not :)

henry2cox commented 1 year ago

pushed 1c16cc36b45a - which prints source code location information of the inconsistency.