lme4 / lme4

Mixed-effects models in R using S4 classes and methods with RcppEigen
Other
612 stars 146 forks source link

Fitting a generalized linear mixed model to a very large data set #806

Open marangiop opened 2 days ago

marangiop commented 2 days ago

Hello,

I am working with a large dataset spread over 300 files. Here I am showing you an example where I load one of these files into R as a tibble.

> elder_clean
# A tibble: 4,094,925 x 31
# Groups:   SAMPLE, subnational [5,406]

After scaling some of the columns, I would then use the function glmer to fit the model as below:

withkin_basic2<-glmer(withkin_ind ~ (1|combined_group)+(1|SAMPLE), na.action = na.omit, family = binomial(link = "logit"), weights=weight, data = elder_clean)

Since I am not able to load the 300 files in memory as a single tibble, I would like to implement this in a parallel fashion, namely using distributed memory computing (just like @bbolker suggested in the StackOverflow thread below from 2015). I have been googling on this topic and have come across some Github issues or Stackoverflow threads related to LMMs and GLMMs. Some of the threads below are quite old, yet I haven´t been able to find any clear code examples in R or Python detailing a potential solution for fitting a GLMM in a distributed memory fashion.

Resources

Can somebody please advise? Is photon-ml the solution?

Just to clarify I am not doing a PhD :)

bbolker commented 2 days ago

I don't know the distributed-memory systems well (barely at all).