Open clarkfitzg opened 8 years ago
We should just do 3. Yes, it's complicated, but many other frameworks, like foreach, etc, have managed to get it to work. It's the behavior the R user expects.
FYI 1: I handle globals automatically in future with help of the globals package which does the heavy lifting. The idea is that the globals package defines how globals are identified and gathered (supporting different strategies for different purposes. e.g. codetools is mostly for R CMD check
!= export globals for parallel processing). There are a few corner cases that needs to be fixed - shouldn't be hard (mostly time). I haven't put much efforts of finalizing / freezing the globals API itself, but if you going down that path I can work with you on this.
FYI 2: It didn't take long before I got feedback / wishes to add support for manually controlling globals as an alternative (https://github.com/HenrikBengtsson/future/issues/84), so I've recently added support for that too in future. Some of this new code might be migrated to the globals package.
PS. I think foreach only handles globals at the R prompt / global environment(?), but as soon as you start using foreach()
within functions you need to specify globals explicitly.
Interesting packages @HenrikBengtsson, I'll give those a try. A list of variable names seems friendlier than an environment, but environments work nicely with do.call
.
The workhorse for SparkR is their processClosure function.
Seems like we could support both 2 and 3 through adding an argument roughly like
dlapply(..., env = NULL){
if(is.null(env)) env <- fancy_environment_maker(...)
...
}
The changes in #15 brought this bug out. Global variables are not exported to a PSOCK cluster. This causes the kmeans example to fail. A minimal example:
So I think we should make a call for how ddR should work for portability. Here's what I see as the options:
Right now 2) is the most appealing, because it's clear what's happening. 1) would be not enough- for example I often compose a large function out of several small functions. 3) is appealing, but is significantly more complex.