Open dselivanov opened 7 years ago
I think the child processes still share that memory with the parent process.
Actually at the moment things are more complicated/weird even for parallel.fork
driver:
library(ddR)
library(Rcpp)
library(parallel)
cppFunction(
"
SEXP init_std_vector(IntegerVector x) {
std::vector<int> *xx = new std::vector<int>(x.size());
for(int i = 0; i < x.size(); i++)
xx->at(i) = x[i];
XPtr< std::vector<int> > ptr(xx, true);
return ptr;
}
")
cppFunction(
"
IntegerVector get_std_vector(SEXP ptr) {
XPtr< std::vector<int> > vec(ptr);
return wrap(*vec);
}
"
)
ddr = useBackend("parallel", 2, type = "FORK")
v1 = dmapply(function(x) init_std_vector(1:10), list(1, 2), output.type = "dlist", combine = "c")
v2 = dmapply(function(x) get_std_vector(x), v1, output.type = "dlist", combine = "c")
Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: external pointer is not valid
I think this is because:
collect(v1)
[[1]]<pointer: 0x0> [[2]]<pointer: 0x0>
Drive-by comment regarding https://github.com/vertica/ddR/issues/25#issuecomment-290913798:
Yes, this is because these on-the-fly compiled Rcpp objects hold external-pointer references making these objects non-exportable/non-serializable. FWIW, this is mentioned in https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html.
The solution is to not compile on-the-fly and instead put Rcpp code in a package.
Looking to fork_driver.R we can see that
dmapply
is essentiallymcmapply
. So we rely on the fact that every object can be easily lazily copied from master to worker process. But we are missing the fact, that it is possible that elements ofdlist
can be objects which keep some data out of R's heap in external pointers.