pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation
70 stars 20 forks source link

Very high memory requirements for powerSim #179

Closed schnarrd closed 4 years ago

schnarrd commented 4 years ago

I'm trying to do a power analysis for a project where we expect 40,000 observations (at least) across 40-60 "units".

When I do this power analysis using a much smaller number of observations within units (say, 100 per unit), the simulation runs fine. However, memory requirements seem to explode when I up the observations per unit to 300 or 500. I haven't been able to get the simulation to run even on a computer with a memory limit of 70GB.

What's going on? Why are the memory requirements so high and how would I make them more manageable? The generic power simulation script is below, for reference.

`

psacr001_power.R: Run a job using specifications determined using a shell script

Get the arguments passed from the shell script

args <- commandArgs(trailingOnly=TRUE) job <- as.numeric(args[[1]]) sample_size <- as.numeric(args[[2]]) number_of_labs <- as.numeric(args[[3]]) icc_size <- as.numeric(args[[4]]) slope_size <- as.numeric(args[[5]]) effect <- as.numeric(args[[6]])

Required R libraries

library(lme4) library(simr) library(pbkrtest) library(tibble)

create dataset

simulated.df <- tibble( x = sample(rep(0:1, sample_size number_of_labs / 2)), #group assignment g = rep(1:number_of_labs, sample_size), #number of clusters y = rnorm(sample_size number_of_labs) #outcome )

Create simple model

model.of.interest <- lmer(y ~ x + (x|g), data=simulated.df)

Create a small effect of interest

fixef(model.of.interest)['x'] <- effect

create various ICC intercept

VarCorr(model.of.interest)["g"][["g"]][1] <- icc_size

create varioous slopes

VarCorr(model.of.interest)["g"][["g"]][4] <- slope_size

try not to be singular

attr(VarCorr(model.of.interest)[["g"]], "correlation")[2:3] <- .3

power_summary <- tibble( job = numeric(), successes = numeric(), trials = numeric(), mean = numeric(), lower = numeric(), upper = numeric() )

simulate those models

temp <- powerSim(fit = model.of.interest, nsim = 200, progress=FALSE) power_summary[1 , 1] <- job power_summary[1 , 2:6] <- summary(temp)

write.csv(powersummary, paste(c("res", job, ".csv"), collapse=""), row.names=FALSE) `

pitakakariki commented 4 years ago

First guess is the model size is a bit much for the KR test. Does it work if you specify

test=fixed("x", "z")

instead of relying on the default?

schnarrd commented 4 years ago

Thanks, that seemed to have done the trick.

If I understand correctly, the default was using a more computationally expensive way of computing p-values? Sorry, I'm just trying to figure out what was actually happening here.

pitakakariki commented 4 years ago

Yes, the default in the case was to use the Kenward-Roger approximation from pbkrtest.