kollerma / robustlmm

This is an R-package for fitting linear mixed effects models in a robust manner. The method is based on the robustification of the scoring equations and an application of the Design Adaptive Scale approach.
28 stars 9 forks source link

`rlmer` fails with memory error for 30,904 row dataframe, even provisioning a virtual machine with arbitrarily large RAM #23

Closed tiburona closed 2 years ago

tiburona commented 2 years ago

I'm currently on an Apple M1 Mac Pro with 16 gigs of RAM. rlmer fails with this error: Error: vector memory exhausted (limit reached?)

RStudio offers this traceback (function 1 is mine, obviously):

13.
eval(call, parent.frame())
12.
eval(call, parent.frame())
11.
callGeneric(x1, x2)
10.
.nextMethod(.Object = .Object, ... = ...)
9.
callNextMethod()
8.
initialize(value, ...)
7.
initialize(value, ...)
6.
new("dgeMatrix", Dim = d, Dimnames = dn, x = callGeneric(x1, 
x2))
5.
tcrossprod(.U_eX %*% r$M_BB, .U_eX) + tmp2
4.
tcrossprod(.U_eX %*% r$M_BB, .U_eX) + tmp2
3.
lobj@pp$updateMatrices()
2.
rlmer(eval(model), data = data)
1.
process_data(model = "std_brain ~ type*taught*std_beh + (1|subject/cluster_name)", 
"datafile", datafile = file.path(resdir, "robust0005", "pos_rel_con__all_clusters.csv"))

I was working on this several months ago and I also tried provisioning machines on AWS with >100 gigs of RAM and it didn't stop rlmer from failing with a memory error. If I split my dataset in half, rlmer runs, so the limit it's encountering is somewhere in between 15452 rows and 30,904 rows.

kollerma commented 2 years ago

Thanks for reporting this. It's expected that rlmer needs more memory than lmer. Running a model with 30000 rows on 16GB may be a stretch, but I would definitely expect that 100GB would be enough. After all the maximum size dense matrix required to in the process is a 30000 x 30000 matrix that should require about 7.2GB of memory. Maybe a 2 or 3 times that accounting for a bit of inefficiency, but not more than 100GB.

I have profiled the code a bit and found that R is using a multiple of the minimum amount of memory. That is partly due to bad code, but also because of how R handles (and has to handle) things. The Rcpp implementation of rlmer, rlmerRcpp does this a little bit better. Have you tried fitting the model using rlmerRcpp yet?

As for rlmer, with a few small changes I was able to reduce the memory required for some test data by about 60%. Looking at the formulas involved, it looks as if much more could be saved by using a few more tricks. Dropping the requirement to hold a 30000 x 30000 matrix at any one point in time altogether. The latter would be easiest to do for the rlmerRcpp version, as it would involve computing things row-wise in a loop.

I'll have to see whether things actually work out, but I will try to work on that over the next few weeks.

It would help if you could send me your dataset so I don't have to do as much guessing. I don't need the real data, you can simulate fake responses and remove the labels of the levels. Just the structure is the important bit.

tiburona commented 2 years ago

Thanks for your quick response! I'm happy to send you my actual dataset; nothing in it is particularly sensitive, but I can't find a contact email for you. Can you help with that? Or I suppose I can make some changes and just drop it here. nonsense_data.csv

Also, just as an update, when I set this line in .Renviron on my local machine:

R_MAX_VSIZE=100Gb

it still fails, but differently, in that it runs for two hours and then crashes. I was trying to verify whether I had changed the .Renviron on the EC2 instances I'd provisioned when I did this before. I believe I did, but I'm having much more trouble getting my EC2 instance up and running than I did before so I'm having trouble proving that to myself.

kollerma commented 2 years ago

Thanks for sending the dataset. I am going to use the formula D ~ A*B*C + (1|subject/Run), is that correct?

I'm not sure about whether setting R_MAX_VSIZE is needed. I don't have it set on my machine, but then I have never needed more memory than the machine had available anyway.

tiburona commented 2 years ago

You actually alerted me to the fact that in my attempts to make this run I had accidentally dropped a term from my model! It should be C ~ D * A * B + (1|subject/Run) + (1|subject:div_name). I also missed the part about rlmerRcpp. I did just try it, and it fails faster than rlmer (by crashing R), which is itself helpful, but still doesn't complete. I'll get back to working on getting an EC2 instance up and running and seeing if I can make rlmerRcpp run if I have a virtual machine with a lot of memory.

kollerma commented 2 years ago

I have pushed a new version of robustlmm that avoids the computation of the n x n matrix and instead only ever computes one row and column at a time. You may even be able to fit the model on your laptop now. The new code may be slow - I didn't have any time to profile it. I have only changed the implementation for rlmer, as rlmerRcpp may be failing for other reasons. (rlmerRcpp may be failing because of a bug that is triggered for complex random effects structures. I have never found the time to debug it.)

You can get the new version by running devtools::install_github("kollerma/robustlmm").

tiburona commented 2 years ago

Hey I'd be happy to check it out (thanks so much for your prompt attention!) but installation is currently failing (maybe uniquely on my Apple Silicon architecture).

Here's some output:

In file included from robustlmm_init.cpp:1:
In file included from ./globals.h:4:
In file included from ./PsiFunction.h:5:
In file included from ./Integration.h:4:
In file included from ./misc.h:9:
In file included from /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/RcppEigen/include/RcppEigenForward.h:40:
In file included from /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/RcppEigen/include/unsupported/Eigen/SparseExtra:51:
/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/RcppEigen/include/unsupported/Eigen/../../Eigen/src/Core/util/ReenableStupidWarnings.h:14:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
18 warnings generated.
clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -DSTRICT_R_HEADERS -DUSE_FC_LEN_T -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/RcppEigen/include' -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/robustbase/include' -I'/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/cubature/include' -I/opt/R/arm64/include   -fPIC  -falign-functions=64 -Wall -g -O2  -c zeroin.c -o zeroin.o
clang++ -arch arm64 -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/opt/R/arm64/lib -o robustlmm.so DasScale.o FitEffects.o Fitter.o Integration.o PsiFunction.o fastGHQuad.o helpers.o index.o misc.o rlmerPredDModule.o rlmerRespModule.o robustlmm_init.o zeroin.o -L/Library/Frameworks/R.framework/Resources/lib -lRlapack -L/Library/Frameworks/R.framework/Resources/lib -lRblas -L/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.6.0/12.0.1 -L/opt/R/arm64/gfortran/lib -lgfortran -lemutls_w -lquadmath -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
ld: warning: directory not found for option '-L/opt/R/arm64/gfortran/lib/gcc/aarch64-apple-darwin20.6.0/12.0.1'
ld: warning: directory not found for option '-L/opt/R/arm64/gfortran/lib'
ld: library not found for -lgfortran
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [robustlmm.so] Error 1
ERROR: compilation failed for package ‘robustlmm’
* removing ‘/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/robustlmm’
* restoring previous ‘/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/robustlmm’
Warning message:
In i.p(...) :
  installation of package ‘/var/folders/7_/l3q1512j0qb8dwpnv24_r9d40000gn/T//RtmpeHe2iY/file553438392f4/robustlmm_3.0-5.tar.gz’ had non-zero exit status
> 

Meanwhile, I have an EC2 instance up and running trying to run rlmerRcpp right now. Once that has succeeded or failed, I can try to install this new robustlmm on the virtual machine and see if I can install it with a different architecture.

kollerma commented 2 years ago

You are missing the gfortran compiler. You can find instructions on how to install it at https://mac.r-project.org/tools/.

kollerma commented 2 years ago

Just pushed new version that improves the new code of the previous update. It's more than 40 times faster. Things should be even faster for rlmerRcpp - once I get it to run.

tiburona commented 2 years ago

Thanks! I'll give it a try. I did get your previous version to run, but it was in fact quite slow -- it took many hours of processor time.

kollerma commented 2 years ago

Pushed a new version with the more efficient approach for rlmerRcpp. I've also fixed a bug that made R crash when there was more than one variance component.

kollerma commented 2 years ago

I've pushed yet another version. I've removed rlmerRcpp completely. It didn't give much of a performance boost and was too complicated to maintain. Instead I've converted some crucial bits to directly call BLAS routines in C++ code. There is more to be optimized, but that'll have to wait a bit for now.

The performance of the new version depends on whether or not optimized BLAS are used. The default implementation is about 10 times slower. On a Mac you need to enable this using the following commands:

cd /Library/Frameworks/R.framework/Resources/lib/
ln -s -i -v libRblas.vecLib.dylib libRblas.dylib

An explanation can be found here: https://www.mail-archive.com/r-sig-mac@r-project.org/msg06199.html