klmr / box

Write reusable, composable and modular R code
https://klmr.me/box/
MIT License
829 stars 47 forks source link

S3 method "registered" in the global environment are not respected inside a `box`-ed module #304

Closed ArcadeAntics closed 1 year ago

ArcadeAntics commented 1 year ago

Error description

I have an S3 class "testClass" and I have a object / / method for said class:

obj <- structure(0:25, class = "testClass")
names.testClass <- function(x) letters

this function is declared in the global environment. I also have a script "test.R" that looks like this:

writeLines("testFun <- function(x) names(x)", "./test.R")

When I load this module with box::use(./test) and use test$testFun(obj), it fails to recognize this "registered" method (I say registered in quotes because it is not actually registered with registerS3method(), just that evalq(names(obj), .GlobalEnv) works as expected.

This comes from the fact that 'box' sets the parent environment of "namespace:test" to baseenv() instead of .BaseNamespaceEnv (if the parent environment of "namespace:test" was the base namespace, the parent of which is the global environment, then the method would be found from the context of the module.)

If you'd like a more reproducible example, try this:

obj <- structure(0:25, class = "testClass")
names.testClass <- function(x) letters

exprs <- expression(testFun <- function(x) names(x))

envir1 <- new.env(parent = .BaseNamespaceEnv)
source(exprs = exprs, local = envir1)
envir1$testFun(obj)

envir2 <- new.env(parent = baseenv())
source(exprs = exprs, local = envir2)
envir2$testFun(obj)

I find the following output:

> envir1$testFun(obj)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
[25] "y" "z"
> envir2$testFun(obj)
NULL

However, I'm sure changing the parent environment to the base namespace instead of the base environment has some drastic consequences, so I'm not sure how realistic this is.

R version

platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          2.2                              
year           2022                             
month          10                               
day            31                               
svn rev        83211                            
language       R                                
version.string R version 4.2.2 (2022-10-31 ucrt)
nickname       Innocent and Trusting            

also on my Linux computer:  

platform       x86_64-pc-linux-gnu               
arch           x86_64                           
os             linux-gnu                                              
system         x86_64, linux-gnu                
status                                          
major          4                                
minor          2.1                              
year           2022                             
month          06                               
day            23                               
svn rev        82513                            
language       R                                
version.string R version 4.2.1 (2022-06-23)
nickname       Funny-Looking Kid            

This also fails on several older versions of R.

‘box’ version

1.1.2

klmr commented 1 year ago

This is by design: you must register S3 methods for them to be found. And this is generally the case, not just with ‘box’. For instance, as of version 3.6, R no longer searches all attached environments, only the current environment and the top environment. Try the following:

e <- new.env()

local(envir = e, {
  names.testClass <- function(x) letters
})
attach(e)

obj <- structure(0:25, class = "testClass")
names(obj)

This prints NULL rather than letters, even though names.testClass is on the search path and could be called explicitly.

The S3 documentation of R is unfortunately incredibly shoddy but the expected usage in script code is to register S3 methods explicitly using .S3method, e.g.:

…
local(envir = e, {
  names.testClass <- function(x) letters
  .S3method('names', 'testClass')
})
…

This makes the above code work, and it also makes the ‘box’ module work.


I'm sure changing the parent environment to the base namespace instead of the base environment has some drastic consequences

Correct. First off, as noted above this wouldn’t actually fix the problem, except for the very limited case of method definitions inside .GlobalEnv, but nowhere else. Secondly, the reason this fix works is precisely the reason why box does not use the base namespace as the parent environment: namely, parent.env(.BaseNamespaceEnv) is .GlobalEnv, and that leads to error-prone code, because it means that module code might be accidentally accessing names that are not defined inside the module. In fact, not just .GlobalEnv but all attached packages would leak into the module’s namespace.

Consider the following module:

fllter = function (x, cond) x[cond] # oops, typo in name!

#' @export
even_odd = function (x) filter(x, rep(c(FALSE, TRUE), length.out = nrow(x)))

In the hypothetical ‘box’ implementation (and the actual implementation of R packages!), the author tests the code, it works, so the author ships it. But then users complain that the code doesn’t work for them. Worse, the code actually ran and produced a result, and they noticed too late that the result was wrong, losing their company millions dollars.

Turns out, the author had the ‘dplyr’ package attached while testing their code, so the filter call inside the module invoked dplyr::filter. The users didn’t have ‘dplyr’ attached, so the module instead called stats::filter instead.

‘box’ prevents this situation by being intentionally strict about how names are searched.

ArcadeAntics commented 1 year ago

This is by design: you must register S3 methods for them to be found. And this is generally the case, not just with ‘box’. For instance, as of version 3.6, R no longer searches all attached environments, only the current environment and the top environment. Try the following:

e <- new.env()

local(envir = e, {
  names.testClass <- function(x) letters
})
attach(e)

obj <- structure(0:25, class = "testClass")
names(obj)

This prints NULL rather than letters, even though names.testClass is on the search path and could be called explicitly.

The S3 documentation of R is unfortunately incredibly shoddy but the expected usage in script code is to register S3 methods explicitly using .S3method, e.g.:

…
local(envir = e, {
  names.testClass <- function(x) letters
  .S3method('names', 'testClass')
})
…

This makes the above code work, and it also makes the ‘box’ module work.

I'm sure changing the parent environment to the base namespace instead of the base environment has some drastic consequences

Correct. First off, as noted above this wouldn’t actually fix the problem, except for the very limited case of method definitions inside .GlobalEnv, but nowhere else. Secondly, the reason this fix works is precisely the reason why box does not use the base namespace as the parent environment: namely, parent.env(.BaseNamespaceEnv) is .GlobalEnv, and that leads to error-prone code, because it means that module code might be accidentally accessing names that are not defined inside the module. In fact, not just .GlobalEnv but all attached packages would leak into the module’s namespace.

Consider the following module:

fllter = function (x, cond) x[cond] # oops, typo in name!

#' @export
even_odd = function (x) filter(x, rep(c(FALSE, TRUE), length.out = nrow(x)))

In the hypothetical ‘box’ implementation (and the actual implementation of R packages!), the author tests the code, it works, so the author ships it. But then users complain that the code doesn’t work for them. Worse, the code actually ran and produced a result, and they noticed too late that the result was wrong, losing their company millions dollars.

Turns out, the author had the ‘dplyr’ package attached while testing their code, so the filter call inside the module invoked dplyr::filter. The users didn’t have ‘dplyr’ attached, so the module instead called stats::filter instead.

‘box’ prevents this situation by being intentionally strict about how names are searched.

Isn't this just a better reason to be making packages instead of boxes? R CMD check would immediately pick up that no function 'filter' was imported.

klmr commented 1 year ago

Isn't this just a better reason to be making packages instead of boxes?

How so? ‘box’ modules are stricter (and thus more robust) than R packages. The behaviour I outlined in my comment above fixes what I perceive as a serious flaw in the R package system.

R CMD check would immediately pick up that no function 'filter' was imported.

Yes, if you run it. But R CMD check also has limitations since it performs static analysis, and R is an inherently dynamic language. So static analysis won’t find every possible usage of a name. R CMD CHECK has both false negatives and false positives: code that — for whatever reason — uses e.g. get("filter") instead of filter would pass R CMD check and fail silently at runtime. Conversely, code that uses NSE (correctly) would create a false-positive flag in the check.

But fundamentally it’s true that having some static checks would be beneficial. Adding this functionality to ‘box’ is on the to-do list.