s-u / Rserve

Fast, flexible and powerful server providing access to R from many languages and systems
http://RForge.net/Rserve
Other
282 stars 65 forks source link

[Error] BODY DUMP FAILED for reading large (sparse) matrices #161

Closed dcellwanger closed 3 years ago

dcellwanger commented 3 years ago

Hello @s-u ,

Thanks for developing and maintaining Rserve!

I am running into a problem with loading large sparse matrices with Rserve and I hope you can help with resolving it.

Please find the code to reproduce the error on Linux and macOS in the following.

Generating some data first:

set.seed(1101)
M <- Matrix::rsparsematrix(nrow=16624, 
                           ncol=101249, 
                           density=1-0.9348445)
D <- list(M, M)
saveRDS(D, file="test.rds")

The list consumes about 2.5GiB memory and the exported rds file is about 835M.

Running Rserve on a RHEL7 (AWS EC2 instance w/ 16 vCPUs and 128 Gib memory):

> Rserve::Rserve(debug=TRUE)
Starting Rserve:
 /opt/R/4.0.2/lib/R/bin/R CMD /opt/R/4.0.2/lib/R/library/Rserve/libs//Rserve.dbg 

Note: debug version of Rserve doesn't daemonize so your R session will be blocked until you shut down Rserve.
Rserve 1.8-7 () (C)Copyright 2002-2013 Simon Urbanek
$Id$

Loading config file /etc/Rserv.conf
conf> command="eval", parameter="library("Matrix")"
Found source entry "library("Matrix")"
conf> command="maxinbuf", parameter="10000000"
conf> command="maxsendbuf", parameter="0"
conf> command="encoding", parameter="utf8"
Loaded config file /etc/Rserv.conf

R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Executing source/eval commands from the config file.
voidEval("library("Matrix")")
voidEval: buffer parsed, stat=1, parts=1
result type: 20, length: 1
R_tryEval(xp,R_GlobalEnv,&Rerror);
Calling R_tryEval for expression 1 [type=6] ...
Expression 1, error code: 0
Done with initial commands.
 - create_server(port = 6311, socket = <NULL>, mode = 0, flags = 0x4000)
INFO: adding server 0x3bd4970 (total 1 servers)
Rserve: Ok, ready to answer queries.

Then, loading the generated data in an Rserve session using Java with the REngine.jar/Rserve.jar libraries from the Rserve 1.8-7 R package causes the error:

RConnection c = new RConnection();
String cmd = "dat <- readRDS('test.rds')";
REXP r = c.parseAndEval("try("+cmd+",silent=FALSE)");

Here is the tail of the stdout from Rserve(debug=TRUE):

getStorageSize(0x7fbcee01cdd0,type=1,len=140449415188712) = 12
stored 0x7fbcee01cdd0 at 0x24a97efa4, 8 bytes
stored 0x7fbcf2f54150 at 0x1fc210fe8, 1316413380 bytes
stored 0x7fbcf2f54118 at 0x1fc210fe0, 1316413388 bytes
stored 0x7fbcefc03188 at 0x1adaa3008, 2632826788 bytes
stored SEXP; length=2632826800 (incl. DT_SEXP header)
OUT.sendRespData
HEAD DUMP [16]: 01 00 01 00 b0 bf ed 9c 00 00 00 00 00 00 00 00  |................
BODY DUMP FAILED (len=-1662140496)
DUMP [-1662140496]:  |
Connection closed by peer.
done.
Error: ignoring SIGPIPE signal
Fatal error: unable to initialize the JIT

The identical error can be reproduced on a macOS. Here is the session info for the macOS test run:

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.0.2 Rserve_1.8-7  

Thanks for your time! It is greatly appreciated!

Best regards, Daniel

s-u commented 3 years ago

@dcellwanger unfortunately, this is a limitation of the current Java client which only supports object sizes up to 2^31. The core source is that arrays in Java can only be indexed by signed 32-integers which means the largest array size in Java is 2^31-1 = 2,147,483,647. Although Rserve supports up to ~7e16 data sizes, the Java client does not.

FWIW the debugging DUMP code in Rserve also uses integers, but the actual payload is correct - the response header is:

Header:  01 00 01 00 b0 bf ed 9c 00 00 00 00 00 00 00 00

which is correctly 2632826800 bytes.