kalibera / rchk

102 stars 10 forks source link

protect stack is too deep/unsupported form of unprotect #21

Closed kingaa closed 5 years ago

kingaa commented 5 years ago

Running rchk on my package pomp, I get several messages like the following.

Function add_args
  [UP] protect stack is too deep, unprotecting all variables, results will be incomplete
  [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /home/kingaa/projects/Rpkg/rchk/build/IAv0e71i/pomp/src/dprior.c:27

These arise in constructions such as the following (this one is the relevant code chunk from dprior.c mentioned above).

static R_INLINE SEXP add_args (SEXP names, SEXP log, SEXP args)
{

  int nprotect = 0;
  SEXP var;
  int v;

  PROTECT(args = LCONS(AS_LOGICAL(log),args)); nprotect++;
  SET_TAG(args,install("log"));

  for (v = LENGTH(names)-1; v >= 0; v--) {
    PROTECT(var = NEW_NUMERIC(1)); nprotect++;
    PROTECT(args = LCONS(var,args)); nprotect++;
    SET_TAG(args,install(CHAR(STRING_ELT(names,v))));
  }

  UNPROTECT(nprotect);
  return args;

}

As you can see, this function is designed to take a arbitrary-length vector of variable names and construct a call from it. This is extremely useful and, I believe, totally legitimate.

My question is as to how I should interpret the message thrown by rchk. I suspect they reflect the inability of current rchk codes to follow the loop of arbitrary length. Is this correct? If so, do you have any plans or see any way to extend rchk's capacity in this way?

kalibera commented 5 years ago

In principle, you can ignore these warnings from the tool, it just tells the tool cannot understand this code properly. In this case the report is from [UP], which is checking for individual unprotected variables, and that part of the tool cannot handle loops where the protection stack depth increases with every loop iteration. Other parts of the tool, e.g. checking for protection balance ([PB]), can do it. I am afraid that the type of modelling that [UP] is doing cannot be fixed to support this pattern.

But as you showed this code, please note that the unbounded protection is not necessary. Only the head of a pairlist, in this example of args, needs to be protected at any time. You can use REPROTECT to achieve that or you can have a dummy cons-cell (an extra stable head of the pairlist) that will stay protected for the whole duration of the loop, and args would be linked to that dummy cell. Note also that var does not have to be protected as soon as it gets into a protected cons cell (protected via args), so it does not have to stay explicitly protected after the body loop exits. Btw also note the function installChar. Typically, nprotect is used in code where it is cognitively too diffucult to track what is the stack depth, but the stack depth is still constant. But of course if you know that the stack depth increase will be very small, that the length of names will be very small, the code should work fine.

kingaa commented 5 years ago

@kalibera: thanks for the timely response and the useful tips.

It might be too much to ask (just say so if it is), but I wonder if you could show me, in the context of the above snippet of C code, how to use the REPROTECT facility or the dummy cons cell approach. Again, no worries if you're too busy, etc.

kalibera commented 5 years ago

No problem, you can use REPROTECT like this:

PROTECT_INDEX px;

PROTECT_WITH_INDEX(args = LCONS(AS_LOGICAL(log),args), &px);
SET_TAG(args,install("log"));

for (v = LENGTH(names)-1; v >= 0; v--) {
  PROTECT(var = NEW_NUMERIC(1));
  REPROTECT(args = LCONS(var,args), px);
  UNPROTECT(1); /* var */
  SET_TAG(args,installChar(STRING_ELT(names,v)));
}

UNPROTECT(1); /* args */

See Writing R Extensions for more details.

kalibera commented 5 years ago

In principle you don't have to protect var, because LCONS is callee-protect, so that the code can be simplified further. But there is no harm protecting it, and one then does not have to worry which functions are callee-protect:

PROTECT_INDEX px;

PROTECT_WITH_INDEX(args = LCONS(AS_LOGICAL(log),args), &px);
SET_TAG(args,install("log"));

for (v = LENGTH(names)-1; v >= 0; v--) {
  var = NEW_NUMERIC(1);
  REPROTECT(args = LCONS(var,args), px);
  SET_TAG(args,installChar(STRING_ELT(names,v)));
}

UNPROTECT(1); /* args */
kalibera commented 5 years ago

Anyway, with that assumption if you wanted to avoid REPROTECT, you could also do this:

PROTECT(args = LCONS(AS_LOGICAL(log),args));
SET_TAG(args,install("log"));

for (v = LENGTH(names)-1; v >= 0; v--) {
  var = NEW_NUMERIC(1);
  args = LCONS(var,args);
  UNPROTECT(1); /* old args */
  PROTECT(args);
  SET_TAG(args,installChar(STRING_ELT(names,v)));
}

UNPROTECT(1); /* args */
kingaa commented 5 years ago

Excellent! Thanks so much, @kalibera! This will jump-start me in simplifying the code.

Say, do you know what the performance overhead associated with PROTECT, UNPROTECT, and REPROTECT is?

kalibera commented 5 years ago

I'd say that unbounded protection due to a loop increasing the protection stack depth is always good to avoid. For the remaining, the performance difference between REPROTECTing the item on the very top of the stack and a call to UNPROTECT(x); PROTECT(x) is so small compared to other performance costs in any program, that one should choose the variant that makes the code more readable. In this case it may be a matter of taste, I myself prefer the UNPROTECT(x); PROTECT(x). Other than the unbounded protection, I don't think that in a user program one would ever get to a situation when the direct execution time overhead of PROTECT, UNPROTECT and REPROTECT would be relevant, so the only constraints should be correctness, robustness and readability of the code.