rchk "falsely claims" imbalanced protect `Rf_onintr` and `Rf_error` used by Rcpp

kalibera / rchk

102 stars 10 forks source link

rchk "falsely claims" imbalanced protect `Rf_onintr` and `Rf_error` used by Rcpp #34

Closed dipterix closed 2 months ago

dipterix commented 3 months ago

Original case here: https://github.com/RcppCore/Rcpp/issues/1318

Basically Rcpp automatically generates a RcppExport function with 1 PROTECT and 3 UNPROTECT. The author of Rcpp thinks Rf_onintr and Rf_error should break out of the function call, hence it was rchk falsely claiming the imbalanced protect.

I'm sorry don't know how to generate a minimum playable example here. Tried my best to simplify, and here's the skeleton of the function causing the rchk to firing alarms.

RcppExport SEXP _myFunction(...) {
    ...
    rcpp_result_gen = PROTECT(...);

    ...
    if (rcpp_isInterrupt_gen) {
        UNPROTECT(1);
        Rf_onintr(); 
    }
    ...
    if (rcpp_isError_gen) {
        UNPROTECT(1);
        Rf_error(...); 
    }
    UNPROTECT(1);
    return rcpp_result_gen;
}

My original proposal at Rcpp is there should be a return statement after Rf_onintr and Rf_error. But Rcpp author thought it's rchk's responsibility to break after Rf_onintr and Rf_error. (It could also be my fault!)

kalibera commented 2 months ago

I can't really help much without a specific example. But in principle, R on long jumps automatically unprotects (restores the pointer protection stack depth). One should not be unprotecting before calling Rf_error() - it is R that will do this work. The PROTECT/UNPROTECT calls should be placed as if functions called were returning normally and it is the responsibility of the user code to ensure protection balance on (only) local returns. Note that in most cases, the caller doesn't know whether the function called will return normally or not. So, in the code above, there shouldn't be "UNPROTECT(1)" before neither Rf_onintr() nor before Rf_error().

dipterix commented 2 months ago

I guess my confusion is there "shouldn't" or unprotect is "not necessary"?

If the stack is already balanced before Rf_error(), wouldn't it be OK?

Just to prove the point, the following code doesn't trigger any issue.

library(inline)
cpp.fun = cxxfunction(
  signature(), 
  body=r"(
SEXP re = PROTECT(Rf_allocVector(REALSXP, 2));
bool a = true;
if( a ) {
  UNPROTECT(1);
  Rf_error("Nah");
}
UNPROTECT(1);
return R_NilValue;
)")

cpp.fun()

kalibera commented 2 months ago

I guess my confusion is there "shouldn't" or unprotect is "not necessary"?

At least from the design principle, it shouldn't be unprotected before. Also the code is easier to read and understand when you don't unprotect (as I wrote, with many functions, it may not be that easy to say whether they long jump or not). It is not just a tool that checks pointer protection balance, it is also humans, and should be in the first place.

If the stack is already balanced before Rf_error(), wouldn't it be OK?

Whether one can unprotect before call to Rf_error() without causing an actual memory error depends on what Rf_error() does in that particular case. Often it wouldn't be a problem.