kalibera / rchk

102 stars 10 forks source link

Is it possible to restrict the analysis to certain functions only? #14

Open gaborcsardi opened 5 years ago

gaborcsardi commented 5 years ago

This would be nice when one is trying to fix a problem, because analyzing all functions of a big packages takes quite long.

kalibera commented 5 years ago

In principle this would be possible to a certain level, some of that is available already but only internally and requires recompilation and I use it for debugging of the tool. But, some needed analyses are whole-program and have to be done on every run - so the actual performance improvements would vary. In principle, the tool could be extended/redesigned to support serialization/deserialization of analyses results and re-use some depending on how much of the code has changed, but that would be too much work, not worth the effort.

As any bug finding tool that uses static analysis, rchk cannot find all (PROTECT) errors. It is most useful when one thinks for a while about the reports and the code, about the code around and thinks about similar code patterns in the code that may be also wrong, but perhaps not reported, etc. From my experience, when I am asked to comment on suspected false alarms, I often find out several true PROTECT errors just few lines of code away, even when the actual reports ends up to be a false alarm. I have no theoretical explanation, but it just seems that when the code is complex enough to confuse the tool, it is perhaps also much more likely it has real bugs. Given this usage scenario, waiting for a minute or two for checking of one package should not be limiting - one should definitely not just be looking for minor guess-edits that happen to silence the reports, but may not fix the real problems.

In some cases you can reduce the analysis time by reducing the maximum number of states (when your package has functions checking of which always runs out of memory, then reducing the maximum, if the number of functions running out does not increase, is a safe way to reduce the checking time). Historically, I've also observed better performance of the tool when compiled with gcc than with clang.

One may also use rchk to annotate the source code with where it sees calls to allocating functions, this helps with the human interpretation of the results - but, ideally, one would be conservative and assume all function calls may allocate from the R heap when writing the code. The tool is indeed more precise and performs analyses to find out, but in some cases it has to be conservative and assume allocation (e.g. for calls via function pointers).