Confounded variables (was: For the models with two random effects)

SCgeeker commented 7 years ago

Hi, This is the great package to estimate the power and sample size for the language studies that analyze the data in use of the mixed-effect model. I am analyzing the power based on the statistical values of the paper. There are only the means and standard deviations of the two critical conditions, and the comparisons between two mixed-effect models.

My full script is in this public gist. My idea begins at the creation of the simulated data in terms of the condition means and standard deviation. The mixed-effect models are extended along with the number of items and then the number of subjects. Although I got the reasonable estimations of statistical powers according to the description in the original paper, the powers are not going to change after I increase the number of subjects. Is there something wrong I set up the models?

By the way, the powerCurve could crash when the 'nsim' was set up at 1,000. Is there the limitation when we run the power analysis on the models with two random effects?

thanks!

pitakakariki commented 7 years ago

I'm not sure what sort of crash you're getting for nsim=1000, but it should be okay. You might be running out of memory, or it could just be taking a long time to run.

pitakakariki commented 7 years ago

The problem with power not increasing looks like it might have to do with the ITEM random effect.

Note that the items with CONSISTENCY==1 are distinct from the items with CONSISTENCY==2. This means that a difference in response could either be due to a CONSISTENCY fixed effect or due to ITEM random effects. If the design doesn't allow the model to distinguish these, you will get low power.

SCgeeker commented 7 years ago

Hi Peter,

Your concern is right. I changed my approach and found the increasing power along with the items.

> Power Curve of increasing items
sim_alleffects.item = extend(sim_alleffects.lmer, along = "sim_ITEM", n = 300)

pc.interaction.item.samples <- powerCurve(sim_alleffects.item, test=fixed("sim_CONSISTENCY:sim_NEIGHBORHOOD", method = "t"), along = "sim_ITEM", nsim=100, breaks = c(120,180,240,300))

>  Power for predictor 'sim_CONSISTENCY:sim_NEIGHBORHOOD', (95% confidence interval),
>  by number of levels in sim_ITEM:
>      120: 58.00% (47.71, 67.80) - 240 rows
>      180: 79.00% (69.71, 86.51) - 360 rows
>      240: 88.00% (79.98, 93.64) - 480 rows
>      300: 93.00% (86.11, 97.14) - 600 rows

I'm wondering if the number of items constraint the statistical power of the design like this.

pitakakariki commented 7 years ago

[ ] vignette explaining low power due to confounding
[ ] investigate automatic tests for confounding

SCgeeker commented 6 years ago

Hi, I got some hints to fix my problem. It matters for the model rather than for the bugs. Because my data have two factors, and both are within-subject and between-item. Months ago I try to run simulations on the model with the subject intercepts and item intercepts. The model is like this: Y ~ A*B + (1|subject) + (1|item) During these weeks I learned this model does not capture all the random effects in my data. The model should be like this: Y ~ A*B + (1+A*B|subject) + (1|item) In the last simulations, I got the reasonable estimations. My suggestion is to give the users the instruction to decide the terms of the model based on their data.

pitakakariki / simr

Confounded variables (was: For the models with two random effects) #80