lme4 / lme4

Mixed-effects models in R using S4 classes and methods with RcppEigen
Other
622 stars 148 forks source link

getData.merMod(): ! 'data' object found is not a data frame or matrix #808

Open tamelung opened 2 months ago

tamelung commented 2 months ago

What I did.

I computed GLMMs on a high performance computing cluster. I imported these in my local R-Studio as RDS-files. Now, I am trying to plot diagnostic plots following the procedures laid out in Mixed model lab #1. I run the following code:

qqmath(m3Corr)
plot(m3Corr,ID~resid(.))

What happened.

I get these error messages:

Error in getData.merMod(object) : ‘data’ object found is not a data frame or matrix

What I expected to happen.

to receive diagnostic plots

potential problem / solution

in https://github.com/lme4/lme4/blob/2f37b6dcac9fea9ea81f7980e2c23a61cde73368/R/plot.R#L31 getData.merMod seems to try and get the data from the environment. In my special case using external computation power, this doesn't work. After I read in the data with the exact same name as used in the original call the plots seem fine.

sessionInfo()

R-Studio (Version 2024.04.2+764 (2024.04.2+764))

R version 4.3.3 (2024-02-29) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.4.1

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] lme4_1.1-35.3 lattice_0.22-6 interactions_1.1.5 performance_0.12.2 effectsize_0.8.8
[6] emmeans_1.10.1 data.table_1.15.4 bruceR_2024.6 gridExtra_2.3 broom.mixed_0.2.9.5 [11] tibble_3.2.1 moments_0.14.1 multcomp_1.4-25 TH.data_1.1-2 MASS_7.3-60.0.1
[16] survival_3.6-4 mvtnorm_1.2-4 tidyr_1.3.1 stringr_1.5.1 readxl_1.4.3
[21] dplyr_1.1.4 ggplot2_3.5.1 Matrix_1.6-5

bbolker commented 2 months ago

We need a little bit more information.

library(lme4)
fitfun <- function(form) {
   set.seed(101)
   dd <- data.frame(x = rnorm(1000), f = factor(rep(1:20, each = 50)))
   dd$y <- simulate( ~ x + (1|f), newdata = dd, newparams = list(beta = c(1,1), theta = 1, sigma = 1), family = gaussian)[[1]]
   fit <-  lmer (form, data = dd)
   return(fit)
}
## construct formula *outside* of environment where data exist
form <- y ~ x + (1|f) 
m <- fitfun(form)
library(lattice)
qqmath(m) ## object 'dd' not found

I could provoke your error if I use the name data instead of dd for the data frame generated inside the function (this way, rather than a data frame not being found, the function will find the built-in function data() ...)

The fundamental problem is that R model objects don't typically carry around a full copy of the data; they do carry a model frame, but that doesn't necessarily have the original variables in it (for example if a model uses log(x), the model frame will have log(x) in it rather than x ...

We could certainly (1) try harder to find the data (maybe looking in the model frame in case it does have what we need), (2) try harder not to need the data (e.g. qqmath(m) shouldn't need to access the data; plot.merMod is a little bit smarter, it only tries to get the data if the formula refers to a variable that needs to be drawn from the data [e.g. plot(m, residuals(.) ~ fitted(.)) works fine in the above example, we only run into trouble when try plot(m, residuals(.) ~ x).

Can you say a little bit more about where the data and formula are constructed in your workflow so we can see if there's a way to make this work for you?

tamelung commented 2 months ago

Thank you for the quick reply! I am working on two machines. I use my local machine for data preparation, descriptives, and model interpretation and a remote cluster for the computation of the GLMMs. I proceeded like this: On my local machine:

Just for clarity, after this last step I was planning to import the model with the optimised RE structure back into my main analysis environment for inference.

Is this the information you needed?

bbolker commented 2 months ago

Are you OK with your workaround? Whether the data get carried along in the environment of the formula or not (i.e. whether getData.merMod works automatically) depends somewhat on the details of where (i.e., in what environment) the formula is defined (things can get a bit weird, for example, if you set up the formula as a character string and it gets automatically converted inside the machinery, rather than defining it as a formula object in the first place). Making it all work seamlessly in all possible workflow permutations is challenging ...

tamelung commented 2 months ago

Yes, I am. I was wondering, whether the feedback might be improved if the dataset cannot be found in the environment. It took me some time to figure this out, especially as my dataset had the rather generic name ‘data’. Also, I first tried to fix / manipulate the data stored in the object because from the message, I understood it might be in the wrong format and I figured it somehow got altered in the process of storing and re-reading it on different systems. Till Amelung (er / he)Sent mobile. Please excuse brevity.Am 03.09.2024 um 03:01 schrieb Ben Bolker @.***>: Are you OK with your workaround? Whether the data get carried along in the environment of the formula or not (i.e. whether getData.merMod works automatically) depends somewhat on the details of where (i.e., in what environment) the formula is defined (things can get a bit weird, for example, if you set up the formula as a character string and it gets automatically converted inside the machinery, rather than defining it as a formula object in the first place). Making it all work seamlessly in all possible workflow permutations is challenging ...

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>