Avoid non-function code in packages

gaborcsardi commented 6 years ago

This issue is for discussion. In general it is good practice to avoid code outside of function definitions in R packages, because non-function code creates build-time (i.e. R CMD INSTALL time) dependencies.

In particular, R6Class() is typically (?) called outside of function definitions, and I am wondering if we can do better. In particular in the ps package, I am doing this: https://github.com/r-lib/ps/blob/764dc78c03691d5692f28c874e87bd31a924dd85/R/osx.R

ps_env <- new.env(parent = emptyenv())
process_osx <- function() {
  if (is.null(ps_env$process_osx)) {
    ps_env$process_osx <- R6Class(
      "process_osx",
      cloneable = FALSE,
      public = list(
      ...
    )
  )
}

and then to create a new object you would call process_osx()$new(...). Storing the class object speeds up object creation.

This seems to be ok to me, the enclosing environments of the methods look good as well:

❯ cl <- process_osx()
❯ ins <- cl$new(Sys.getpid())
❯ ls(environment(ins$exe))
[1] "self"  "super"
❯ ls(parent.env(environment(ins$exe)))
[1] "ps_env"
❯ parent.env(parent.env(environment(ins$exe)))
<environment: namespace:ps>

The first is the env with self, then the execution env of function that created the class, and then the package env.

@wch Do you think this is OK? I think it is, but I might be missing something.

wch commented 6 years ago

I think that should work. I suppose you could even do it with an active binding, so that the interface is just like using a regular R6Class:

ps_env <- new.env(parent = emptyenv())

makeActiveBinding("process_osx",
  function() {
    if (is.null(ps_env$process_osx)) {
      message("creating class")
      ps_env$process_osx <- R6Class(
        "process_osx",
        public = list(
          x=1
        )
      )
    }
    ps_env$process_osx
  },
  environment()
)

process_osx$new()

I'm always a little wary of active bindings, though.

All that said, it should be safe to call R6Class() at the top level of package code and save the result, because all the functions used by that object are saved in a special environment which gets saved along with the R6Class object when the package is built. The R6Class object does not use anything that is in the R6 namespace, so using R6 this way introduces only a build-time dependency (as opposed to the run-time dependency in your code and my code above).

Of course, if that R6Class object is exported and then another package called $new() at the top level, then you have a problem, but none of the things we've discussed here would solve that.

The encapsulate() function here is what R6 uses to store some functions in the special environment: https://github.com/r-lib/R6/blob/bb56213/R/aaa.R

gaborcsardi commented 6 years ago

I think that should work.

Thanks.

I'm always a little wary of active bindings, though.

Rightly so, I think. AFAIR R CMD check has problems with active bindings in the package env, and they might lead to memory leaks as well.

All that said, it should be safe to call R6Class() at the top level of package

Good to know, thanks!

wch commented 6 years ago

I just tested this to be sure that an R6ClassGenerator doesn't require any code from the R6 package, by creating a dummy version of the package that has no R code at all, on this branch: https://github.com/r-lib/R6/tree/dummy-package

To test, I installed Shiny (which has many calls to R6Class() at the top level), then installed the dummy version of R6, then restarted R and ran a Shiny application. Everything worked fine, which confirms that, the R6ClassGenerator object and the R6 objects it creates do not need any code from the R6 package itself.

There are a few things that won't work correctly: S3 methods like print.R6, plot.R6 are missing, as well as is.R6. But those are typically called from outside an R6 class, so I think this isn't too bad.

@gaborcsardi If you could test it out to confirm (and make sure I'm reasoning about it correctly), I'd appreciate it.

wch commented 6 years ago

Now that I think about it, I wonder if copies of those utility functions (print.R6, is.R6, etc) should be stored inside the "capsule" environment. Then the risk of breakage could be reduced from low (in the cases where an R6 class uses these functions) to zero. If those functions change in the future, though, there could be a mismatch in the code for, say, print.R6 used by a class that's baked into a package, compared the version that's available for the rest of the R session.

r-lib / R6

Avoid non-function code in packages #149