luca-scr / GA

An R package for optimization using genetic algorithms
http://luca-scr.github.io/GA/
91 stars 29 forks source link

Parallel error #25

Closed neverfox closed 5 years ago

neverfox commented 5 years ago

When trying to use the parallel option, it gives the following error: Error in R process: simpleError : task 1 failed - "object of type 'closure' is not subsettable". Works without parallel option. Any ideas?

> traceback()
4: stop(simpleError(msg, call = expr))
3: e$fun(obj, substitute(ex), parent.frame(), e$data)
2: foreach(i. = seq_len(popSize), .combine = "c") %DO% {
       if (is.na(Fitness[i.])) 
           do.call(fitness, c(list(Pop[i., ]), callArgs))
       else Fitness[i.]
   }
1: ga(type = "permutation", fitness = eval2, lower = 1, upper = 1000, 
       popSize = 10, maxiter = 2, parallel = TRUE)
luca-scr commented 5 years ago

I'm really bad at guessing... Please provide a minimal reproducible example (data+code https://en.wikipedia.org/wiki/Minimal_Working_Example) so I can try to reproduce your problem and have a chance to solve the problem.

neverfox commented 5 years ago

Of course, but it's also worth checking if this is a recognized problem or if the stacktrace shows something obvious before spending the time developing the minimal reproducible example. In any case, I believe I understand the nature of the problem. My eval function references global data structures and uses libraries that aren't being exported to the workers and "pure" R functions work fine in parallel mode. This is something you can easily do when setting up a cluster manually (e.g. clusterEvalQ and clusterExport), but it doesn't appear you can point GA to a pre-configured cluster, unless I'm mistaken?

luca-scr commented 5 years ago

Packages and objects defined in your current environment are exported to slaves if necessary (i.e. unless multicore is used). To set up the cluster manually please read http://luca-scr.github.io/GA/articles/GA.html#parallel-computing

neverfox commented 4 years ago

Here's a minimal example, showing that the package data.table doesn't work inside of the parallel functionality:

library(data.table)
library(GA)

D <- data.table(v = 1)

fitness <- function (x) {
  D[, v]
}

ga("real-valued", fitness, lower = 0, upper = 1, parallel = TRUE)

# Error in { : task 1 failed - "object 'v' not found"
luca-scr commented 4 years ago

Since the fitness function has argument x and you use v inside the function, it is not a surprise that the error says "object 'v' not found".

neverfox commented 4 years ago

That's intentional because that's valid data.table syntax for accessing a column of the data table. The intention is demonstrate that you cannot successfully use data.table column access (which doesn't require quoting the column names) inside of the fitness function in parallel mode, yet you can if you change parallel to FALSE.

library(data.table)
library(GA)

D <- data.table(SomeColumn = 1)

# valid syntax
D[, SomeColumn]

fitness <- function (x) {
  D[, SomeColumn]
}

ga("real-valued", fitness, lower = 0, upper = 1, parallel = TRUE)