klmr / box

Write reusable, composable and modular R code
https://klmr.me/box/
MIT License
829 stars 47 forks source link

Unexpected error when trying to access non-existent variable from module #360

Open ArcadeAntics opened 3 months ago

ArcadeAntics commented 3 months ago

Error description

I tried to access a variable from a module but had mistyped the name, so it tried to throw an error with message "name '%s' not found in '%s'". Instead of doing that, it threw a different error with message "'Rf_translateChar' must be called on a CHARSXP, but got 'symbol'" because the C function assumes the first argument is a symbol. As an example:

box::use(utils)
x <- list(utils = utils)
x$utils$test

I'm not sure what the exact expected output would be, but any of the following error messages would suffice:

That being said, I think your C code to generate such an error is unnecessarily complex, and could be greatly simplified with the use of .External2 instead of .Call. I could submit a PR if you like!

R version

platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          3.3                              
year           2024                             
month          02                               
day            29                               
svn rev        86002                            
language       R                                
version.string R version 4.3.3 (2024-02-29 ucrt)
nickname       Angel Food Cake

‘box’ version

[1] ‘1.2.0’

klmr commented 3 months ago

Thanks for the catch!

I don’t know much about .External2. Its documentation is very sparse but suggests that it is an internal function (that said, I believe that you are using it in ‘this.path’ without issues with CRAN, right?). And it isn’t mentioned in any of the R manuals (R-lang, R-ints or R-exts).

Do you know about the performance implications of switching to .External2? The current version is the result of benchmarking to make $ as fast as possible (for the non-error case). That’s also why all the “heavy lifting” only happens in the error path.

I’d be happy about a PR!

ArcadeAntics commented 2 months ago

.External2 allows you to write C functions that have the same formal arguments as R's C functions. The formal arguments are (SEXP call, SEXP op, SEXP args, SEXP rho) though R sometimes uses env instead of rho:

so for $.box$mod, if the line was updated to .External2(c_strict_extract, e1, e2):

It means you can entirely skip the process of finding parent in the C function strict_extract and just use rho instead.

I use it in package:this.path everywhere, no problems with CRAN.

I would've originally told you that the performance between .Call and .External2 is identical, but it doesn't seem that way. In do_External, you can see the lines of code R_args_enable_refcnt(args); and R_try_clear_args_refcnt(args);. .Call does not contain these lines of code because the args pairlist is never given to the user. At least on my computer, this adds about 100 nanoseconds to each $.box$mod call, originally taking about 1000 nanoseconds. I don't know if you'd consider that negligible or not. In the PR I'll submit soon, I'll change the .Call C function with the necessary fixes, and then I'll make a copy of it and make it an .External2 C function so that you can check the benchmarks yourself.

ArcadeAntics commented 2 months ago

Sorry to so suddenly change my mind, but I just remembered why I switched from using .Call to .External2.

At one point, I was doing something similar to obtain parent where the call to .Call was evaluated. It worked in very simple cases, such as the one you have right now, but it fails for more complex cases. For example, my R function try.this.path uses a call to .External2 inside a call to tryCatch. If you were to do something similar, i.e. a tryCatch around a call to c_strict_extract, parent would be the evaluation environment of tryCatch and not the environment where c_strict_extract was evaluated. So you might say that since it works in the current case, it doesn't need to be changed, but for the sake of future proofing your code and avoiding a headache trying to fix it, I think it would be best to use .External2 right away.

So now that we know the current method of determining parent doesn't always work, the choices are:

and between the two of those, .External2 is faster and easier to read at the C level, so I'll use it in the PR I submit. I should hopefully have it done very soon.