Fitting a generalized linear mixed model to a very large data set

Hello,

I am working with a large dataset spread over 300 files. Here I am showing you an example where I load one of these files into R as a tibble.

> elder_clean
# A tibble: 4,094,925 x 31
# Groups:   SAMPLE, subnational [5,406]

After scaling some of the columns, I would then use the function glmer to fit the model as below:

withkin_basic2<-glmer(withkin_ind ~ (1|combined_group)+(1|SAMPLE), na.action = na.omit, family = binomial(link = "logit"), weights=weight, data = elder_clean)

Since I am not able to load the 300 files in memory as a single tibble, I would like to implement this in a parallel fashion, namely using distributed memory computing (just like @bbolker suggested in the StackOverflow thread below from 2015). I have been googling on this topic and have come across some Github issues or Stackoverflow threads related to LMMs and GLMMs. Some of the threads below are quite old, yet I haven´t been able to find any clear code examples in R or Python detailing a potential solution for fitting a GLMM in a distributed memory fashion.

Resources

Can somebody please advise? Is photon-ml the solution?

Just to clarify I am not doing a PhD :)

I don't know the distributed-memory systems well (barely at all).

lmmpar doesn't look useful, at least as currently written (it does not do distributed-memory computing).
photon-ml looks like it might work, although from a quick look at its introductory materials I can't see whether it actually does shrinkage estimation for what it calls 'random effects' (the presentation doesn't go into enough technical detail and I don't want to dig deeper).
if you know Spark/sparklyr and want help building the solution for yourself (starting here, I think) I will help you. (I could write an example that ran [for a model where the data did fit in memory] in parallel across shards, if you were willing to do the work of adapting it to Spark ...)
If you're looking for an off-the-shelf solution, I'm not aware of one.

lme4 / lme4

Fitting a generalized linear mixed model to a very large data set #806