pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation
70 stars 19 forks source link

Taking too much time to run the code #223

Open sangheek16 opened 3 years ago

sangheek16 commented 3 years ago

Hi,

I've extended the model along participant number and used powerCurve with nsim=100. It's been taking me more than 72 hrs, and the code is still running. I've tried the same code with nsim=10, and it took me 5 min. I've also tried the same code with nsim=20, and this took me 11hrs 25min.

Is this normal? Does the calculation time increases exponentially by the nsim value?

Below is the model I used. logRT is a continuous variable. Grammaticality.f, Distractor.f, and Clause.f are categorical with two levels, respectively. RT2 and WordLength are continuous variables.

model <- lmer(logRT ~ Grammaticality.f*Distractor.f*Clause.f+RT2+WordLength+ (Distractor.f+Clause.f+Grammaticality.f|Participant) + (Distractor.f+Clause.f+Grammaticality.f|Item), data=so1)

I've extended the model along the number of participants.

model_ext <- extend(model, along="Participant", n=1000)

I then used the powerCurve with nsim=100.

pc <- powerCurve(model_ext, test=fixed("Grammaticality.f1:Distractor.f1:Clause.f1", method="z"), along="Participant", nsim=100, breaks=c(100,200,300,400,500,600,700,800,900,1000))

I wasn't sure if this is an issue with my code, or whether I should expect the running time to increase exponentially as I increase the value of nsim.

Thanks for your help!

pitakakariki commented 3 years ago

I would normally expect it to increase linearly.

Does powerSim for a single sample size scale properly for you?

powerCurve does all the simulations at the start so it's possible this is a memory issue. How many observations per participant in your data?

sangheek16 commented 3 years ago

Thanks for your help!

Just to clarify your first question, could you explain more what you mean by "a single sample size" and "scale properly"?

If your second question is about the number of rows per participant, there are 46 rows in the data frame.

pitakakariki commented 3 years ago

If you use powerSim instead of powerCurve, can you increase the number of simulations without the time increasing unreasonably?

sangheek16 commented 3 years ago

I would say the time increases quite reasonably if I use powerSim. nsim=20 took me 1 h 28 m 13 s, and nsim=100 took me 7 h 46 m 54 s. For your reference, the model powerCurve with nsim=20 took me 11 h and more than 72 h (couldn't finish calculating) with nsim=100. Would sometime be going on with powerCurve?

pitakakariki commented 3 years ago

Sounds like it's a memory thing - the package was designed for ecologists so I didn't expect people to run such large models. You'll need to run each sample size with powerSim I think.

sangheek16 commented 3 years ago

I'll work with powerSim with each sample size. Thanks a lot for your help, really appreciate it!