kalibera / rchk

102 stars 10 forks source link

rchk seems to flag allocation functions incorrectly sometimes #5

Closed viking closed 6 years ago

viking commented 6 years ago

In the yaml package, rchk lists the following problems:

https://raw.githubusercontent.com/kalibera/cran-checks/master/rchk/results/yaml.out

It notes that yaml_emitter_emit is an allocation function that could cause R to run garbage collection and therefore destroy an unprotected object. However, this function does not use any R functions whatsoever. Does rchk look for any malloc call or just R-specific allocation functions?

Thanks in advance.

joshuaulrich commented 6 years ago

This output appears consistent with the rchk documentation (emphasis added):

Note that sometimes the tool can give surprising annotations. A particular case is that it conservatively assumes that any call to external code may allocate from the R heap, which is often not the case.

kalibera commented 6 years ago

Thanks, Joshua, that's a perfect answer - this is probably a false alarm for this reason and hence safe to ignore (if you're sure the function is external and would never call back into any function from R, which I assume is the case here, please ignore this report).

viking commented 6 years ago

Thanks for your reply. I'm happy to ignore the warning, however, the CRAN maintainers seem to be relying on rchk for package submissions. I want to avoid having to make a case each time I need to update the yaml package.

viking commented 6 years ago

I could simply add extraneous PROTECT and UNPROTECT calls before and after using 3rd party library functions. I thought it best to open an issue here to see if the rchk tool could be improved in order to avoid this kind of thing. I don't want to add extra code unnecessarily.

kalibera commented 6 years ago

Note that R_yoink is an allocating function which may also be called when names is in scope, to fix that warning you would have to protect anyway, and this will make the false alarms go away (it is common to protect when you allocate and unprotect when it is clear the variable may no longer be used, so you'll cover the calls to external functions incorrectly deemed as allocating, anyway).

The tool cannot do much about external code it does not know. It could make the opposite guess -- assume that external code does not allocate, but it would then miss real errors in other cases.

If you were curious, you can find more about the tool at:

https://github.com/kalibera/cran-checks/blob/master/rchk/PROTECT.md https://github.com/kalibera/rchk/blob/master/README.md (and linked documents)

viking commented 6 years ago

In this case, the third party code is actually packaged up inside the R package. The rchk tool has access to it for analysis.

kalibera commented 6 years ago

Isn't that in the YAML library? (external)

viking commented 6 years ago

The YAML library is built alongside the R extension:

https://github.com/viking/r-yaml/tree/master/pkg/src

The source files with names that are not prefixed with r_ belong to libyaml. This is currently necessary because there's a patch.

kalibera commented 6 years ago

I see, so I've traced the detailed decisions of the tool in this case and it treats yaml_emitter_flush as an allocating function, because this function calls another function via a function pointer emitter->write_handler. The tool conservatively assumes that calls via a function pointer allocate. The tool does not know which function will be called, so it can't do better and this is by design. Extending the analysis so that the tool would know which functions are called via pointers will be too complicated (and won't work in most cases anyway). The tool treats yaml_emitter_emit as an allocating function, because it (transitively) calls into yaml_emitter_flush.

viking commented 6 years ago

Okay, that makes sense. I was able to silence the warnings by adding more GC protection. I was under the impression that the names attribute was protected under the umbrella of a protected object, that's why I didn't protect it before.

kalibera commented 6 years ago

Thanks! Re names, it depends on the type of the object, sometimes the names attribute is implicitly protected (e.g. currently for vectors, vector lists, etc), but sometimes not (an obvious example are objects represented using pairlists). I think one should protect names always to make the code robust to future changes in R (e.g. need for some kind of a new conversion to retrieve attributes) or changes in the application. Particularly with the R API which does not have explicit representation of R types (just SEXP), it is often hard to see on which object types getAttrib can be called at particular code location. rchk in some cases can derive these types and won't issue a warning, but usually it can't and it would be too difficult to support that. Doing so manually is often very hard as well, as it involves whole program analysis.