kalibera / rchk

102 stars 10 forks source link

... as it has address taken, results will be incomplete ... #16

Closed romainfrancois closed 5 years ago

romainfrancois commented 5 years ago

I'm starting to get more familiar with rchk (through rhub for now) and trying to systematically deal with its findings. This is very useful, and I'm making progress about the rchk findings in dplyr.

I've sent some initial pull requests to Rcpp as well:

Although I believe it needs much more than that. But results are better with these changes. I am currently at that stage with checking dplyr against a branch of Rcpp that has the fixes from both the PR above https://builder.r-hub.io/status/original/dplyr_0.8.0.9008.tar.gz-d2f2d6a9fcc34b5491d3829cc0521fa8 (this probably will disappear in a few days though).

I'm seeing many results similar to this:

Function SEXPREC* arrange_template<dplyr::NaturalDataFrame>(dplyr::NaturalDataFrame const&, dplyr::QuosureList const&, SEXPREC*)
  [UP] ignoring variable <unnamed var:   %24 = alloca %struct.SEXPREC*, align 8> as it has address taken, results will be incomplete 
  [UP] ignoring variable <unnamed var:   %26 = alloca %struct.SEXPREC*, align 8> as it has address taken, results will be incomplete 

Do I need to pay attention to those ? I don't what they mean.

kalibera commented 5 years ago

ignoring variable X as it has address taken means that some local variable, given by LLVM instruction (rchk did not find a good name for the variable) has address taken, so &x is used or in C++ it is passed by reference to some function. Rchk will not be able to analyze changes that happen via the pointer to the variable - that would be too difficult, in many cases impossible - so it just gives this warning that results will be incomplete. This means that chances are higher than usual that other reports for the same functions may be false alarms. In the CRAN checks I currently filter out all reports for functions that have a warning results will be incomplete - even though indeed I've seen cases when the reports were real errors.

kalibera commented 5 years ago

Re running rchk, I've updated the documentation on how to install locally (on Linux) and I've added a pre-built singularity container to check a single package (for Linux). I've also re-tested and updated the VirtualBox(/Docker) automated installation, VirtualBox can be used also on Windows and MacOS.

romainfrancois commented 5 years ago

Thanks. Could the filtering out be an option of rchk ?

kalibera commented 5 years ago

Some filtering could be an option of rchk but I want to keep the filtering done by CRAN checks separate. It can change any time, also I don't want people to program "against the tool", to look for changes that just happen to make the tool happy (which could accidentally be through confusing the tool so that it does not understand the function well and hence all reports for the function were filtered out), but rather think about what is wrong in the program and how to fix it. That way, one can learn more and find more bugs (some that the tool did not find). I don't expect that everyone implementing packages in C should run the tool, but rather read the reports from CRAN site and fix based on analyzing them.

romainfrancois commented 5 years ago

So you don't want to make your filtering public ? It would be nice to be able to reach the same results before going through cran.

lionel- commented 5 years ago

I don't expect that everyone implementing packages in C should run the tool, but rather read the reports from CRAN site and fix based on analyzing them.

This used to be a good workflow. Now that rchk is enforced on CRAN, it'd be helpful to reproduce results locally, to minimize the feedback loops between CRAN maintainers and package maintainers.

kalibera commented 5 years ago

The enforcement was needed because few hundred packages had reports for years without getting attention of their maintainers, when the majority of the reports (I looked at a random sample) were true errors.

Package maintainers who are passionate about their code being correct can indeed run rchk locally and address all the reports. They can simplify the code where the tool is confused. Possibly they will still have some false alarms after, so they can keep a record of those and share that with the CRAN team if they were asked (and such a case may impact the filters indeed). I've been also answering many questions about individual reports and how to fix.

The filters are primitive post-processing to remove some reports, to reduce the overhead on the CRAN team (and package maintainers), but they can hide real problems both directly and indirectly. They can change often and publishing them could invite temptations for abuse - evasion of some checks instead of addressing the problems or of course justifying they are really false alarms.

romainfrancois commented 5 years ago

Won't keeping these filters private encourage users to come up with their own filters, which might be of lesser quality.

kalibera commented 5 years ago

Users can have their filters, why not? They may be able to capture better what are the false alarms in their package, given their coding style etc. I have seen very few false alarms for C code and I would myself always simplify the code when the tool can not see it is correct. But with C++, this is different, some legitimate pattens the tool cannot understand (well I would not code in C++ for R, but that is for another discussion).

kalibera commented 5 years ago

You can also approach false alarms by implementing a diff (look at code diff and reports diff). I've done that earlier for R code itself, but it has not been used regularly. Might be worth resurrecting it in some way.