kaneplusplus / bigmemory

126 stars 24 forks source link

Core dumping R if using a serialized 'big.matrix' object #100

Open HenrikBengtsson opened 4 years ago

HenrikBengtsson commented 4 years ago

Hi, I discovered that bigmemory core dumps (=crashes/terminates) parallel workers if attempting to use 'big.memory' objects. This appears to be because there is an assumption that the object is always used in the same R process that it was created in, which does not work because of the external pointer. Here is a minimal reproducible example:

In one R session, do:

X <- bigmemory::as.big.matrix(matrix(1:2, nrow = 2L))
saveRDS(X, "X.rds")
quit("no")

In another R session, do:

> library(bigmemory)
> X <- readRDS("X.rds")
> X[1,1]

 *** caught segfault ***
address 0x10, cause 'memory not mapped'

Traceback:
 1: CGetNrow(x@address)
 2: nrow(x)
 3: nrow(x)
 4: CCleanIndices(as.double(i), as.double(nrow(x)))
 5: GetElements.bm(x, i, j)
 6: .local(x, i, j, ..., drop)
 7: X[1, 1]
 8: X[1, 1]

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 1
R is aborting now ...
Segmentation fault (core dumped)
> library(bigmemory)
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/tags/R-4-0-2/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/tags/R-4-0-2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bigmemory_4.5.36

loaded via a namespace (and not attached):
[1] bigmemory.sri_0.1.3 compiler_4.0.2      Rcpp_1.0.5  

Suggestion

Instead of core dumping, detect the problem and give an informative error message:

> library(bigmemory)
> X <- readRDS("X.rds")
> X[1,1]
Error: It is not possible to use a 'big.matrix' that was created in another R process

I don't know the internals, but I assume the problem is that the external pointer:

> X <- bigmemory::as.big.matrix(matrix(1:2, nrow = 2L))
> X@address
<pointer: 0x560457f389b0>

is used without making sure it is still valid.

PS. I consider this a quite serious bug since it can core dump R and parallel workers in R and it's hard to protect against it. People who run parallel code might not even know that bigmemory is used as part of some other package they rely on. This is the first package that I know of that use external pointers that also core dumps R, cf. https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html. It looks like those other packages can detect the problem and prevent core dumping, so, hopefully, it is not to hard to protect against this.

EDIT 2020-08-15 @ 18:11 UTC: Clarified that the bug fix should be to give an informative error message instead of core dumping.

privefl commented 4 years ago

I think this behaviour is fairly known, and this is why descriptors are used instead in parallel settings.

A cleaner way would be to use active bindings of reference classes as proposed in https://github.com/phaverty/bigmemoryExtras by @phaverty, which I've borrowed in bigstatsr.

HenrikBengtsson commented 4 years ago

I realized my request for a bug fix was not clear. I'm not asking to make it possible to export bigmemory objects. The ask is to detect the problem and give an informative error message rather than core dumping. I've updated my top comment accordingly.

privefl commented 4 years ago

Apparently, {magick} has a way to prevent from this:

library(magick)
tiger <- image_read_svg('http://jeroen.github.io/images/tiger.svg', width = 400)
saveRDS(tiger, tmp <- tempfile(fileext = ".rds"))
readRDS(tmp)

Error: Image pointer is dead. You cannot save or cache image objects between R sessions.

But I'm not sure how they manage to do this.

privefl commented 4 years ago

Maybe @jeroen can give a clue.

jeroen commented 4 years ago

You just need to check that your XPtr is not NULL, before using it, everywhere where you use:

Rcpp::XPtr<BigMatrix> pMat(bigMatAddr);
privefl commented 4 years ago

I've just tried checking for bigMatAddr == NULL, but it does not seem to do anything?

jeroen commented 4 years ago

That gives you the address of the SEXP object itself. To get the externalpointer, I think you need pMat.get() or just *pMat may work as well.

You can also use BigMatrix* mat = pMat.checked_get() which is designed exactly for this cause: it will raise an error if the pointer is NULL.

privefl commented 4 years ago

That is very useful information, thanks!

privefl commented 4 years ago

I've tried using BigMatrix * pMat = (as< XPtr<BigMatrix> >(bigMatAddr)).checked_get(); instead of Rcpp::XPtr<BigMatrix> pMat(bigMatAddr);. It does not seem to prevent from crashing.

From what I see from {magick}, you do assert_image() in R, not C++. I still believe that the active binding is the way to go to solve this problem, as it prevents from having to check every single R function.

As {bigmemory} is not using an RC object (but S4), I'm not sure it can use active bindings. I just wonder if we could just wrap the externalptr class within another one doing the protection.

jeroen commented 4 years ago

All that the assert_image function in magick does is call magick_image_dead which checks if the pointer is NULL.