r-lib / R6

Encapsulated object-oriented programming for R
https://R6.r-lib.org
Other
404 stars 56 forks source link

Issue on using R6 classes and foreach() %dopar% together #141

Open VMOrca opened 6 years ago

VMOrca commented 6 years ago

I'm having an issue on R6 classes when used with foreach() together, possibly to do with environments (I'm using Windows). I don't know if I forget to set environment variables or it's a bug...

Suppose that there are two R6 classes, "class1" and "class2". method1 in class1 is dependent on class2 (see example code below). The issue is, if I use foreach() %dopar% on class1, R doesn’t seem to recognise class2, even if I set .export = c("class1", "class2") explicitly in foreach() statement. (Here class1 uses class2) However if I use foreach() on class2, it works fine… (Here I just use class2 within foreach())

So the problem seems to be that, if class2 is “nested” within class1, then class2 will not work with foreach(). I’m feeling like this is to do with environment, but can’t figure out how. I even tried .export = ls(.GlobalEnv) but it still doesn’t work…

I can get around this by instantiate an object from class2 and use it as an extra parameter in method1, e.g. method1 = function(input = 1:3, objectFromClass2) when defining class1, but it may not be an optimal solution in the long run - especially considering code maintainability and ease of debug as the priorities (and that's the reason I'm using R6's OO feature anyway).

Many thanks in advance!

Here is an example of code:

cl = makeCluster(3)
registerDoParallel(cl)

class1 = R6Class(
  "class1", 
  public = list(
    method1 = function(input = 1:3){
      y = class2$new()
      output = y$method2(input)
      return (output * 3)
    }
  )
)

class2 = R6Class(
  "class2", 
  public = list( 
    method2 = function(input) {
      return (input + 1)
    }
  )
)

# This doesn’t work. 
# Error in { : task 1 failed - "object 'class2' not found"
foreach(input = 1:3, .packages = "R6", .export = c("class1", "class2")) %dopar% {
  y = class1$new()
  z = y$method1(input)
  return (z)
}

# This works
foreach(input = 1:3, .packages = "R6", .export = c("class1", "class2")) %dopar% {
  y = class2$new()
  z = y$method2(input)
  return (z)
}

# Class1 also works fine if it’s called outside of foreach()
y = class1$new()
z = y$method1(1:3)
wch commented 5 years ago

I think the source of the problem is an interaction between foreach and the way that the R6 generator object keeps track of the parent environment. The generator has a field called parent_env, and when the class is instantiated, methods will be able to find objects in that environment.

For example:

> class1$parent_env
<environment: R_GlobalEnv>

For regular functions foreach() makes exported objects available in a child environment of the global environment.

library(parallel)
library(doParallel)
cl = makeCluster(3)
registerDoParallel(cl)

f <- function(x) {
  g(x)
}

g <- function(x) {
  list(
    parenvs = pryr::parenvs(),
    rls = pryr::rls()
  )
}

res <- foreach(input = 1:2, .export = c("f", "g")) %dopar% {
  f(input)
}
str(res)
#> List of 2
#>  $ :List of 2
#>   ..$ parenvs:List of 3
#>   .. ..$ :<environment: 0x7fb7c6153910> 
#>   .. ..$ :<environment: 0x7fb7c6153948> 
#>   .. ..$ :<environment: R_GlobalEnv> 
#>   .. ..- attr(*, "class")= chr "envlist"
#>   ..$ rls    :List of 3
#>   .. ..$ : chr "x"
#>   .. ..$ : chr [1:3] "f" "g" "input"
#>   .. ..$ : chr(0) 
#>  $ :List of 2
#>   ..$ parenvs:List of 3
#>   .. ..$ :<environment: 0x7fb7c614bde0> 
#>   .. ..$ :<environment: 0x7fb7c614be18> 
#>   .. ..$ :<environment: R_GlobalEnv> 
#>   .. ..- attr(*, "class")= chr "envlist"
#>   ..$ rls    :List of 3
#>   .. ..$ : chr "x"
#>   .. ..$ : chr [1:3] "f" "g" "input"
#>   .. ..$ : chr(0) 

In your example with R6, the class1$parent_env points to the global environment, not the child which foreach populates with the exported objects, and so the methods for class1 objects won't be able to find class2.

library(parallel)
library(doParallel)
cl = makeCluster(3)
registerDoParallel(cl)

class1 <- R6Class(
  "class1", 
  public = list(
    method1 = function(input = 1:3){
      list(
        parenvs = pryr::parenvs(),
        rls = pryr::rls()
      )
    }
  )
)

class2 <- 1234

res <- foreach(input = 1:2, .packages = "R6", .export = c("class1", "class2")) %dopar% {
  y <- class1$new()
  y$method1(input)
}

str(res)
#> List of 2
#>  $ :List of 2
#>   ..$ parenvs:List of 3
#>   .. ..$ :<environment: 0x7fb7c7ef12b8> 
#>   .. ..$ :<environment: 0x7fb7c7ef12f0> 
#>   .. ..$ :<environment: R_GlobalEnv> 
#>   .. ..- attr(*, "class")= chr "envlist"
#>   ..$ rls    :List of 3
#>   .. ..$ : chr "input"
#>   .. ..$ : chr "self"
#>   .. ..$ : chr(0) 
#>  $ :List of 2
#>   ..$ parenvs:List of 3
#>   .. ..$ :<environment: 0x7fb7c86acdc8> 
#>   .. ..$ :<environment: 0x7fb7c86ace00> 
#>   .. ..$ :<environment: R_GlobalEnv> 
#>   .. ..- attr(*, "class")= chr "envlist"
#>   ..$ rls    :List of 3
#>   .. ..$ : chr "input"
#>   .. ..$ : chr "self"
#>   .. ..$ : chr(0) 

I can think of a couple workarounds, but they're not great.

One possibility: instead of passing class1 and class2 to foreach(), write a function that creates class1 and class2, then pass that function to foreach().

Another possibility: Reassign class1$parent_env inside of the expression passed to foreach():

res <- foreach(input = 1:2, .packages = "R6", .export = c("class1", "class2")) %dopar% {
  class1$parent_env <- environment()
  y <- class1$new()
  y$method1(input)
}

str(res)
#> List of 2
#>  $ :List of 2
#>   ..$ parenvs:List of 4
#>   .. ..$ :<environment: 0x7fb7cb52cf28> 
#>   .. ..$ :<environment: 0x7fb7cb52cfd0> 
#>   .. ..$ :<environment: 0x7fb7cb52cf98> 
#>   .. ..$ :<environment: R_GlobalEnv> 
#>   .. ..- attr(*, "class")= chr "envlist"
#>   ..$ rls    :List of 4
#>   .. ..$ : chr "input"
#>   .. ..$ : chr "self"
#>   .. ..$ : chr [1:4] "class1" "class2" "input" "y"
#>   .. ..$ : chr(0) 
#>  $ :List of 2
#>   ..$ parenvs:List of 4
#>   .. ..$ :<environment: 0x7fb7cb4fda28> 
#>   .. ..$ :<environment: 0x7fb7cb4fd9f0> 
#>   .. ..$ :<environment: 0x7fb7cb4fdbe8> 
#>   .. ..$ :<environment: R_GlobalEnv> 
#>   .. ..- attr(*, "class")= chr "envlist"
#>   ..$ rls    :List of 4
#>   .. ..$ : chr "input"
#>   .. ..$ : chr "self"
#>   .. ..$ : chr [1:4] "class1" "class2" "input" "y"
#>   .. ..$ : chr(0) 

But I think you have to make sure in that case not to modify the class1 in the original R process -- doing so would cause problems down the road.

fabsig commented 5 years ago

I have a similar question. I would like to call a function of the class, which calls another class, inside a foreach loop. This is not working. Below is an example.

Many thanks in advance!

library(R6)
library(foreach)
library(doParallel)

method1_internal = function(input = 1:3){
  y = class2$new()
  output = y$method2(input)
  return (output * 3)
}

class2 = R6Class(
  "class2", 
  public = list( 
    method2 = function(input) {
      return (input + 1)
    }
  )
)

class1 = R6Class("class1",
  public = list(
    aux=NULL,##Auxiliary variable neede in order that foreach can access 'self' object in 'method1' function
    method1=function(input = 1:3,call_internal=FALSE,aux){
      self$aux=aux
      registerDoParallel(3)
      z=self$method1_internal(1)
      z = foreach(input = input, .packages = "R6", .export = c("class2","method1_internal")) %dopar% {##,
        if(call_internal){
          z = self$method1_internal(input)# This doesn’t work. 
        }else{
          z = method1_internal(input)
        }
        return(z)
      }
      stopImplicitCluster()
      return (z)
    },
    method1_internal = function(input = 1:3){#same function as above
      # class2$parent_env <- environment()
      y = class2$new()
      output = y$method2(input)
      return (output * 3)
    }
  )
)

y = class1$new()
y$method1(input=1:3,aux="1",call_internal=FALSE)
y$method1(input=1:3,aux="1",call_internal=TRUE)# This doesn’t work. 
mihaiconstantin commented 4 years ago

Hi @wch, very interesting workarounds. Thanks for suggesting them! I am using the second one where I reassign the parent_env of a class. I went down this path because I want to implement the foreach within anR6 class (i.e., say as a private method) and then call it the constructor.

I know you are very busy, but I would appreciate if you can share from you knowledge what are some potential pitfalls when assigning the parent_env as SomeClass$parent_env <- environment(). What exactly does environment() refer to when this expression (i.e., SomeClass$parent_env <- environment()) is called within the %dopar% of foreach?

Code example ```r Work <- R6::R6Class("Work", public = list( values = NULL, initialize = function() { self$values <- "some values" } ) ) ``` Now, the following `Task` class uses the `Work` class in the constructor. ```r Task <- R6::R6Class("Task", private = list( ..work = NULL ), public = list( initialize = function() { private$..work <- Work$new() Sys.sleep(2) } ), active = list( work= function() { return(private$..work) } ) ) ``` In the `Factory` class, the `Task` class is created and the `foreach` is implemented in `..m.thread()`. ```r Factory<- R6::R6Class("Factory", private = list( ..warehouse = list(), ..amount = NULL, ..parallel = NULL, ..m.thread = function(object, ...) { cluster <- parallel::makeCluster(parallel::detectCores() - 1) doParallel::registerDoParallel(cluster) private$..warehouse <- foreach::foreach(1:private$..amount, .export = ls(parent.env(self$.__enclos_env__)) ) %dopar% { # What exactly does `environment()` encapsulate? object$parent_env <- environment() object$new(...) } parallel::stopCluster(cluster) }, ..s.thread = function(object, ...) { for (i in 1:private$..amount) { private$..warehouse[[i]] <- object$new(...) } }, ..run = function(object, ...) { if(private$..parallel) { private$..m.thread(object, ...) } else { private$..s.thread(object, ...) } } ), public = list( initialize = function(object, ..., amount = 10, parallel = FALSE) { private$..amount = amount private$..parallel = parallel private$..run(object, ...) } ), active = list( warehouse = function() { return(private$..warehouse) } ) ) ``` The key to make it work is `object$parent_env <- environment()`, otherwise, the error is: `Error in { : task 1 failed - "object 'Work' not found"`.