pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation
68 stars 19 forks source link

'long vectors not supported yet' error #230

Open Shotgunosine opened 2 years ago

Shotgunosine commented 2 years ago

I'm trying to use simr with a dataset and model from my experiments and it's generating long vectors not supported yet: subassign.c:1818 errors for every attempted model fit.

Here is a reproducible sample with a subset of the data prepped_mdat.csv:

mdat <- read.csv('prepped_mdat.csv')
mdl <- lmer('s_mfq_tot ~ MFQtminus1 + TimeBetween +(TimeBetween | SDAN) + (1|vid)', data=mdat)
summary(mdl)
powerSim(mdl, nsim=20)

Which produces:

confidence interval):
       0.00% ( 0.00, 16.84)

Test: unknown test
      Effect size for MFQtminus1 is 0.77

Based on 20 simulations, (0 warnings, 20 errors)
alpha = 0.05, nrow = 201

Time elapsed: 0 h 0 m 0 s

nb: result might be an observed power calculation

All of the errors are 'long vectors not supported yet: subassign.c:1818'.

When I run the example in the documentation, it appears to work fine:

subj <- factor(1:10)
class_id <- letters[1:5]
time <- 0:2
group <- c("control", "intervention")

subj_full <- rep(subj, 15)
class_full <- rep(rep(class_id, each=10), 3)
time_full <- rep(time, each=50)
group_full <- rep(rep(group, each=5), 15)

covars <- data.frame(id=subj_full, class=class_full, treat=group_full, time=factor(time_full))
## Intercept and slopes for intervention, time1, time2, intervention:time1, intervention:time2
fixed <- c(5, 0, 0.1, 0.2, 1, 0.9)

## Random intercepts for participants clustered by class
rand <- list(0.5, 0.1)

## residual variance
res <- 2

model <- makeLmer(y ~ treat*time + (1|class/id), fixef=fixed, VarCorr=rand, sigma=res, data=covars)
sim_treat <- powerSim(model, nsim=100, test = fcompare(y~time))

Which produces:

Power for model comparison, (95% confidence interval):
      51.00% (40.80, 61.14)

Test: Likelihood ratio
      Comparison to y ~ time + [re]

Based on 100 simulations, (0 warnings, 0 errors)
alpha = 0.05, nrow = 150

Time elapsed: 0 h 0 m 12 s

Additionally, here's my session info in case it's useful:

R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /gpfs/gsfs11/users/MBDU/nielsond/fhist/env/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] simr_1.0.5     lmerTest_3.1-3 lme4_1.1-27.1  Matrix_1.3-4  

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.1    purrr_0.3.4         splines_4.1.1      
 [4] haven_2.4.3         lattice_0.20-44     carData_3.0-4      
 [7] colorspace_2.0-2    vctrs_0.3.8         generics_0.1.0     
[10] mgcv_1.8-36         utf8_1.2.2          rlang_0.4.11       
[13] nloptr_1.2.2.2      pillar_1.6.2        foreign_0.8-81     
[16] glue_1.4.2          readxl_1.3.1        plyr_1.8.6         
[19] binom_1.1-1         lifecycle_1.0.0     stringr_1.4.0      
[22] munsell_0.5.0       gtable_0.3.0        cellranger_1.1.0   
[25] zip_2.2.0           rio_0.5.27          forcats_0.5.1      
[28] RLRsim_3.1-6        parallel_4.1.1      pbkrtest_0.5.1     
[31] curl_4.3.2          fansi_0.4.2         broom_0.7.9        
[34] Rcpp_1.0.7          backports_1.2.1     scales_1.1.1       
[37] plotrix_3.8-2       abind_1.4-5         ggplot2_3.3.5      
[40] hms_1.1.0           stringi_1.7.4       openxlsx_4.2.4     
[43] dplyr_1.0.7         numDeriv_2016.8-1.1 grid_4.1.1         
[46] magrittr_2.0.1      tibble_3.1.4        tidyr_1.1.3        
[49] crayon_1.4.1        car_3.0-11          pkgconfig_2.0.3    
[52] MASS_7.3-54         ellipsis_0.3.2      data.table_1.14.0  
[55] minqa_1.2.4         iterators_1.0.13    R6_2.5.1           
[58] boot_1.3-28         nlme_3.1-153        compiler_4.1.1     

Any help you can provide would be greatly appreciated.

pitakakariki commented 2 years ago

I'm afraid I can't reproduce the problem, it runs without error for me.

We might be able to pin it down by going step-by-step, i.e.:

y <- doSim(mdl)
mdl2 <- doFit(y, mdl)
doTest(mdl2)

I see that the printout says "unknown test" which is a little suspicious. Maybe if the test was specified explicitly it might work? i.e. doTest(mdl2, fixed("MFQtminus1", "kr")).

Shotgunosine commented 2 years ago

It looks like it's failing in the fit:

mdat <- read.csv('prepped_mdat.csv')
mdl <- lmer('s_mfq_tot ~ MFQtminus1 + TimeBetween +(TimeBetween | SDAN) + (1|vid)', data=mdat)
summary(mdl)
#powerSim(mdl, nsim=20)
y <- doSim(mdl)
mdl2 <- doFit(y, mdl)

produces

Error in newCall[["formula"]][[2]] <- responseForm : 
  long vectors not supported yet: subassign.c:1818

Since it works for you, do you think it might be a dependency/version issue, what was the session info you ran this under?

pitakakariki commented 2 years ago

That's quite strange, not where I would have expected it to break down.

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_New Zealand.1252  LC_CTYPE=English_New Zealand.1252   
[3] LC_MONETARY=English_New Zealand.1252 LC_NUMERIC=C                        
[5] LC_TIME=English_New Zealand.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] simr_1.0.5    lme4_1.1-27.1 Matrix_1.3-4 

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.1  xfun_0.24         purrr_0.3.4       splines_4.1.1     haven_2.4.1      
 [6] lattice_0.20-44   carData_3.0-4     vctrs_0.3.8       generics_0.1.0    htmltools_0.5.1.1
[11] yaml_2.2.1        mgcv_1.8-36       utf8_1.2.2        rlang_0.4.11      nloptr_1.2.2.2   
[16] pillar_1.6.1      foreign_0.8-81    glue_1.4.2        DBI_1.1.1         readxl_1.3.1     
[21] plyr_1.8.6        binom_1.1-1       lifecycle_1.0.0   stringr_1.4.0     cellranger_1.1.0 
[26] zip_2.2.0         evaluate_0.14     knitr_1.33        rio_0.5.27        forcats_0.5.1    
[31] RLRsim_3.1-6      parallel_4.1.1    pbkrtest_0.5.1    curl_4.3.2        fansi_0.5.0      
[36] broom_0.7.8       Rcpp_1.0.7        backports_1.2.1   plotrix_3.8-1     abind_1.4-5      
[41] hms_1.1.0         digest_0.6.27     stringi_1.7.3     openxlsx_4.2.4    dplyr_1.0.7      
[46] grid_4.1.1        tools_4.1.1       magrittr_2.0.1    tibble_3.1.3      tidyr_1.1.3      
[51] crayon_1.4.1      car_3.0-11        pkgconfig_2.0.3   MASS_7.3-54       ellipsis_0.3.2   
[56] data.table_1.14.0 assertthat_0.2.1  minqa_1.2.4       rmarkdown_2.9     iterators_1.0.13 
[61] R6_2.5.0          boot_1.3-28       nlme_3.1-152      compiler_4.1.1   
> 
pitakakariki commented 2 years ago

Tried it on the cloud too with no errors:

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] simr_1.0.5    lme4_1.1-27.1 Matrix_1.3-3 

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.1  purrr_0.3.4       splines_4.1.0     haven_2.4.3       lattice_0.20-44   carData_3.0-4    
 [7] vctrs_0.3.8       generics_0.1.0    mgcv_1.8-35       utf8_1.2.2        rlang_0.4.11      nloptr_1.2.2.2   
[13] pillar_1.6.2      foreign_0.8-81    glue_1.4.2        readxl_1.3.1      plyr_1.8.6        binom_1.1-1      
[19] lifecycle_1.0.0   stringr_1.4.0     cellranger_1.1.0  zip_2.2.0         rio_0.5.27        forcats_0.5.1    
[25] RLRsim_3.1-6      pbkrtest_0.5.1    curl_4.3.2        parallel_4.1.0    fansi_0.5.0       broom_0.7.9      
[31] Rcpp_1.0.7        backports_1.2.1   plotrix_3.8-2     abind_1.4-5       hms_1.1.0         stringi_1.7.4    
[37] openxlsx_4.2.4    dplyr_1.0.7       grid_4.1.0        tools_4.1.0       magrittr_2.0.1    tibble_3.1.4     
[43] crayon_1.4.1      car_3.0-11        tidyr_1.1.3       pkgconfig_2.0.3   MASS_7.3-54       ellipsis_0.3.2   
[49] data.table_1.14.0 minqa_1.2.4       iterators_1.0.13  R6_2.5.1          boot_1.3-28       nlme_3.1-152     
[55] compiler_4.1.0   
Shotgunosine commented 2 years ago

Huh, looks like it's lmerTest's fault. (btw, I added some rows to the test data to avoid singular fit warnings: prepped_mdat.csv)

library("lmerTest")
library("lme4")
library("simr")
mdat <- read.csv('prepped_mdat.csv')
mdl <- lmer('s_mfq_tot ~ MFQtminus1 +(1 | SDAN)', data=mdat)
summary(mdl)
print(powerSim(mdl, nsim=20))

Produces 20 errors again:

Power for predictor 'MFQtminus1', (95% confidence interval):
       0.00% ( 0.00, 16.84)

Test: unknown test
      Effect size for MFQtminus1 is 0.67

Based on 20 simulations, (0 warnings, 20 errors)
alpha = 0.05, nrow = 501

Time elapsed: 0 h 0 m 0 s

nb: result might be an observed power calculation

But if I detach lmerTest it works fine:

detach("package:lmerTest")
mdl <- lmer('s_mfq_tot ~ MFQtminus1 +(1 | SDAN)', data=mdat)
summary(mdl)
print(powerSim(mdl, nsim=20)

Power for predictor 'MFQtminus1', (95% confidence interval):
      100.0% (83.16, 100.0)

Test: Kenward Roger (package pbkrtest)
      Effect size for MFQtminus1 is 0.67

Based on 20 simulations, (0 warnings, 0 errors)
alpha = 0.05, nrow = 501

Time elapsed: 0 h 0 m 4 s

nb: result might be an observed power calculation

Digging into things a little bit more, I took apart the doFit function and it looks like the issue is that the call object returned by getCall on an lmerModLmerTest object has a character string in its formula slot, whereas, getCall on an lmerMod object returns a formula object in its formula slot.

library("lmerTest")
library("lme4")
library("simr")
mdat <- read.csv('prepped_mdat.csv')
fit <- lmer('s_mfq_tot ~ MFQtminus1 +(1 | SDAN)', data=mdat)
summary(fit)
y <- doSim(fit)
responseName <- formula(fit)[[2]]
# cbind response for binomial
if(as.character(responseName)[1] == "cbind") {

    responseForm <- responseName

    responseName <- responseName[[2]]
    if(is.matrix(y)) y <- y[, 1]

} else {

    if(!is.character(responseName)) responseName <- deparse(responseName)
    responseName <- make.names(responseName)

    responseForm <- as.symbol(responseName)
}
newData <- getData(fit)
newData[[responseName]] <- y
newCall <- getCall(fit)
print(class(fit)) # "lmerModLmerTest"
print(getCall(fit)) # lmer(formula = "s_mfq_tot ~ MFQtminus1 +(1 | SDAN)", data = mdat)
print(class(getCall(fit))) # "call"
print(newCall[["formula"]]) # "s_mfq_tot ~ MFQtminus1 +(1 | SDAN)"
print(class(newCall[["formula"]])) # "character"

print("now detach lmerTest")
detach("package:lmerTest")
mdat <- read.csv('prepped_mdat.csv')
fit <- lmer('s_mfq_tot ~ MFQtminus1 +(1 | SDAN)', data=mdat)
summary(fit)
y <- doSim(fit)
responseName <- formula(fit)[[2]]
# cbind response for binomial
if(as.character(responseName)[1] == "cbind") {

    responseForm <- responseName

    responseName <- responseName[[2]]
    if(is.matrix(y)) y <- y[, 1]

} else {

    if(!is.character(responseName)) responseName <- deparse(responseName)
    responseName <- make.names(responseName)

    responseForm <- as.symbol(responseName)
}
newData <- getData(fit)
newData[[responseName]] <- y
newCall <- getCall(fit)
print(class(fit)) # "lmerMod"
print(getCall(fit)) # lmer(formula = s_mfq_tot ~ MFQtminus1 + (1 | SDAN), data = mdat)
print(class(getCall(fit))) # "call"
print(newCall[["formula"]]) # s_mfq_tot ~ MFQtminus1 + (1 | SDAN) <environment: 0x55556d15c1e0>
print(class(newCall[["formula"]])) # "formula"

I primarily work in Python, so I don't really know how I'd go about fixing the issue, but for the time being, it seems like avoiding lmerTest is an acceptable workaround.

pitakakariki commented 2 years ago

Thanks for tracking down the problem. I'll be adding some more unit testing around lmerTest for the next release.