kalibera / rchk

102 stars 10 forks source link

rchk checks internal C functions for protect stack imbalances #6

Closed viking closed 6 years ago

viking commented 6 years ago

For the yaml package, rchk lists the following problems:

https://raw.githubusercontent.com/kalibera/cran-checks/master/rchk/results/yaml.out

It looks like rchk is checking that internal C functions have a balanced stack before and after execution, even when these functions are not being called from R. The yaml package code handles its own protection calls, since it needs to collect R objects onto a managed stack and pop them off as needed. The objects are protected and unprotected carefully so that the end result is a balanced protection stack. Does rchk consider this type of programming style or will I need to rewrite my package so that CRAN checks will pass?

Thanks in advance.

joshuaulrich commented 6 years ago

I find the documentation is clear about what is and isn't considered when rchk is looking for potential stack imbalances (emphasis added):

The tool can handle different versions of the protect calls, a single protection counter per function (commonly named nprotect), certain kinds of loops, saving and restoring the pointer protection stack explicitly, and also some forms of conditionals (e.g. if (nprotect) UNPROTECT(nprotect)). There are still some false alarms and bailouts for code that is still too complicated, and indeed there are not-useful reports for functions that have protection stack imbalance by design.

It's also not relevant how internal C functions are called, whether from R or otherwise. If they allocate an R object, that object needs to be protected from garbage collection as long as it will be used.

I personally would try to re-write my function to be more amenable to rchk's protection stack imbalance checks, simply for my peace of mind.

kalibera commented 6 years ago

Thanks, Joshua, I agree and I would add that UNPROTECT_PTR that is used in the code is also very slow. I would recommend to rewrite the code to have balanced functions. In this case, the natural way would be to let the GC trace the structure that implements the stack, that is, one would need just one protect. One way would be to model the stack using low-level R objects (e.g. a combination of pairlists and vector lists) with external pointers for things that are outside the C heap. This might be an intrusive change, perhaps - so alternatively one could just create an extra stack only for the R objects that are now just one of the things you are putting on the stack - which could be implemented using pairlists. There will be one dummy object that will be protected and the immediate child of this object (CDR) will be the head of the stack (or R_NilValue if the stack was empty). One would then just always pop/push from two stacks instead of one, but all the protect issues will go away.

viking commented 6 years ago

@joshuaulrich The objects are properly protected. The issue is whether or not each internal function should have a balanced stack as opposed to the suite of functions being used after crossing into C from R. The rchk tool seems a bit naive.

viking commented 6 years ago

@kalibera That is a reasonable idea. Thanks.

viking commented 6 years ago

I understand that static analyzers are not easy to write and that increasing the complexity of tools like this to handle more cases is not trivial. I will try @kalibera's suggestion.

viking commented 6 years ago

I was able to refactor my code to use a pairlist as a stack. This eliminates the rchk problems and is overall a better solution, I think. Thanks for your help.