vertica / ddR

Standard API for Distributed Data Structures in R
GNU General Public License v2.0
118 stars 17 forks source link

Fork backend potential issues - external pointers #25

Open dselivanov opened 7 years ago

dselivanov commented 7 years ago

Looking to fork_driver.R we can see that dmapply is essentially mcmapply. So we rely on the fact that every object can be easily lazily copied from master to worker process. But we are missing the fact, that it is possible that elements of dlist can be objects which keep some data out of R's heap in external pointers.

lawremi commented 7 years ago

I think the child processes still share that memory with the parent process.

dselivanov commented 7 years ago

Actually at the moment things are more complicated/weird even for parallel.fork driver:

library(ddR)
library(Rcpp)
library(parallel)

cppFunction(
"
SEXP init_std_vector(IntegerVector x) {
  std::vector<int> *xx = new std::vector<int>(x.size());
  for(int i = 0; i < x.size(); i++)
    xx->at(i) = x[i];
  XPtr< std::vector<int> > ptr(xx, true);
  return ptr;
}
")

cppFunction(
"  
IntegerVector get_std_vector(SEXP ptr) {
  XPtr< std::vector<int> > vec(ptr);
  return wrap(*vec);
}
"
)
ddr = useBackend("parallel", 2, type = "FORK")
v1 = dmapply(function(x) init_std_vector(1:10), list(1, 2), output.type = "dlist", combine = "c")
v2 = dmapply(function(x) get_std_vector(x), v1, output.type = "dlist", combine = "c")

Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: external pointer is not valid

I think this is because:

collect(v1)

[[1]]<pointer: 0x0> [[2]]<pointer: 0x0>

HenrikBengtsson commented 3 years ago

Drive-by comment regarding https://github.com/vertica/ddR/issues/25#issuecomment-290913798:

Yes, this is because these on-the-fly compiled Rcpp objects hold external-pointer references making these objects non-exportable/non-serializable. FWIW, this is mentioned in https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html.

The solution is to not compile on-the-fly and instead put Rcpp code in a package.