linux-test-project / lcov

LCOV
GNU General Public License v2.0
904 stars 242 forks source link

lcov-1.15 version got stuck when collecting coverage and consumed 100% CPU #131

Closed qiuhaonan closed 3 years ago

qiuhaonan commented 3 years ago

image I updated lcov from 1.13 to 1.15 and run coverage collection task and used --rc geninfo_no_exception_branch=1 to filter c++ exception branch. It ran over 12 hours and seemd to get stuck somewhere. I searched some working experience that advised "--rc geninfo_gcov_all_blocks=0". Any idea?

henry2cox commented 3 years ago

In my builds, I needed to use ‘--rc geninfo_gcov_all_blocks=0” in order to avoid timeout – exactly as you suggest. What happens when you tried that?

Were the reports you get from 1.13 without that flag different than you get for 1.15 with the flag? If so…do you have a testcase which exhibits the difference?

Incidentally: if you are using branch coverage with C++ code, you may also need to do some additional branch filtering – else the coverage report is pretty much not actionable.

Henry

qiuhaonan commented 3 years ago

After using --rc geninfo_gcov_all_blocks=0, gcov behaved normally.

Checking several critical files, reports from 1.13 and 1.15 seemed to be same.

It seemd that geninfo_gcov_all_blocks flag had no impact on coverage statistics. Though the lcovrc manual of 1.15 declared that it could work with gcov to reduce statistics and avoid some bugs.

image

In fact, we did use it with c++ code. It really did not work so well, especially for template function / class or embedded macros.

Haonan

henry2cox commented 3 years ago

When you say “did not work so well for C++” – what specific issues do you see?

Coverage tools in general work rather badly for macros (…which are best avoided when possible). In our projects, we don’t have any particular issues with templates.

We did find that branch coverage was completely unusable due to huge numbers of compiler-generated branches for things like exception handling. We also found function coverage metrics less than helpful in template code (e.g., because some methods are unused in certain template instantiations). To handle both of these, I implemented a few filtering hacks that pretty much made the problems go away. Your code base and usage might be different such that more work is required; if that is the case – then I’d be interested to know what else could or should be done (…because we would very likely benefit from such enhancements as well).

Thanks

Henry

qiuhaonan commented 3 years ago

The main problems came from template class / functions.

  1. The construction function of template class in our code could generate 500+ branches...., this class simply inherited a pure virtual class and had 6 template parameters (uint32_t or pointer type) and 6 members (uint32_t, mutex, slotvec)

  2. A very simple conditional judgement in a template functions ("if (a == b)", a and b were uint32_t type) could also generate 100+ branches...

  3. End of namespace or deconstruction function also generated 4 or more branches...

Though they were "reasonable" in low-level aspects, we could not selectively "disable or filter" them now. For every instance of template class or functions, it was hard to guarantee high branch coverage...

Basically, we expected to check if we executed simple judgements such as "if...else, switch, do...while, while, for or foreach". And for template, are there possible methods to bypass checking them?

Currently, we could only skip files which only contain template class /functions, or write many "EXCL_BR_BEGIN" and "EXCL_BR_END" to help bypass checking them.

henry2cox commented 3 years ago

Interesting. I think that we too have seen part of the problem that you describe – but we do not see some of the others.

Specifically:

o will generate just two branches when ‘a’ and ‘b’ are integer types. I’m curious why you are seeing very different branch counts here. Does this happen everywhere? Which compiler and binutils version? Are you using some unusual compile flags? We do see such counts in Verilog code, when we tell the simulator to blow up the vectors – but I don’t think you are looking at Verilog.

o If ‘a’ and/or ‘b’ are classes which do fun things with operator==, conversion operators, etc – then there may be a lot of additional branches generated by the compiler – for example, exception return handling. These we handle by filtering.

o If a and b are template parameters – then it is quite likely that the compiler will evaluate at compile time (and there will be no branch in the code).

o I think the branches here are implicitly generated by the compiler and are also related to exception handling. We remove them by filtering.

The issue with the implicit branches is that they are quite often unreachable even if you were willing to devote heroic effort to try to test them. Most of my users care only about the branches they explicitly wrote - and more-or-less believe that the compiler will generate correct code for exception handling, destructor calls, etc. We do want to test exception handlers as well – but that is a slightly different problem, I think.

Given the above: I had implemented a hack into my fork of lcov (…not yet merged back to master, unfortunately). The hack simply looks at the source code and removes all the “BRDA” records that correspond to lines which do not contain conditionals. For example: neither the closing brace of a scope nor a straightforward function call statement contains a conditional – so branch records on those lines will be dropped. The hack is pretty simple – and is easily defeated by a bloody-minded user or by complicated C++ code. In those cases – we either live with the sub-optimal result, insert ‘exclude’ directives, restructure the code slightly to make the filter function happy – or enhance the filter implementation to handle more cases.

One can certainly argue that the filter approach is hacky. Unfortunately, lcov/branch coverage is not actionable for C++ code without it.

I hope this helps

Henry

qiuhaonan commented 3 years ago

Thanks for your advice. It really helps. We are trying to do similar filters to improve coverage statistics.

Yes, "if (a == b)" was in the template functions simple example like this, I called 3 times for "tempfunc" with 3 different object type and 6 branches generated... image This behavior was comprehensible actually.

We also met branches generated by the compiler and leveraged the "no_exception_branch" to filter them. In my observation, the "no_exception_branch" seemed to ignore the "throw" line in *.gcov file like this... image

After filtering exceptions, we found more "confused" branches in normal functions and they could not be filtered by lcov tool or simple filtering rules, some branches could be explained by reading the assembly code while other branches could not be understood...

example code and results are here image image image

At line 100, there were 22 branches for "params.find("name") != params.end()". By gcov tool, we could get a detailed results like this image

For these branches, were there methods to filter them...?

henry2cox commented 3 years ago

Hi –

With respect to your second question: “…were there methods to filter them?”

It should not be too hard to download the ‘diffcov’ branch and try it out to see if the result is usable in your application or not.

I guess you already know this but:

o ‘params.find(“name”) is called multiple times…if you cache the value, then each lookup won’t show as many bogus entries

o params.find is already called – so you don’t really need to call operator[] (if you cached the return result)

o The conditionals of the second if statement subsume those from the first; the first one can be removed.

I hope this helps.

Henry

qiuhaonan commented 3 years ago

Thank you very much. I will try it.

I felt interesting to develop the lcov tool :) There were many optimizations in code itself. It was a long-term work to do for large project.

Thanks.

Haonan