pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation
69 stars 19 forks source link

Powercurve for Interaction #154

Open lovingpast opened 5 years ago

lovingpast commented 5 years ago

Hi! Recently I met a problem with doing a powercurve for calculating the sample size for an interaction.

My model to be fit is like: fit<-lmer(EmotionValue ~ WTR | EmotionType + WTR + Relation + PSex + TSex + EmotionType + Wrongness + (1|Snr) + (1|Subject) , data=datah1slct)

Basically, each participant would have one WTR and rate 2 different emotions, we want to figure out if the WTR would have different effects on the two emotions. So, in the model, we mainly focus on the interaction between WTR and Emotion Type.

We initially fitted the model and got an effect of the interaction of -1.3 (which is really a big one, we are also confused that this number should be beta or B for the regression). Then we found we could not define the interaction a fixed value since it is not a real factor. So we created a new variable using WTRbyET=WTR*EmotionType and defined its fixed effect as -1.3 for running a powercurve. As each participant has two rows of data, we also added along="subject" part here.

fit<-lmer(EmotionValue ~ WTRbyET + WTR + Relation + PSex + TSex + 
EmotionType + Wrongness + (1|Snr) + (1|Subject) , data=datah1slct)

fixef(fit)["WTRbyET"]<--1.3    
powercurve1 <- powerCurve(fit,along="Subject")

Then, the result became weird:

Power for predictor 'WTRbyET', (95% confidence interval),  
by number of levels in Subject:  
3:  0.00% ( 0.00,  0.37) - 6 rows    
114:  0.00% ( 0.00,  0.37) - 228 rows    
225: 99.60% (98.98, 99.89) - 450 rows    
335: 100.0% (99.63, 100.0) - 670 rows    
446: 100.0% (99.63, 100.0) - 892 rows    
557: 100.0% (99.63, 100.0) - 1114 rows    
668: 100.0% (99.63, 100.0) - 1336 rows    
778: 100.0% (99.63, 100.0) - 1556 rows    
889: 100.0% (99.63, 100.0) - 1778 rows    
1000: 100.0% (99.63, 100.0) - 2000 rows

There is a crazy huge jump of the number from 0% to 100% of the power. Meanwhile, according to previous studies, the sample size for reaching 80% power could never be less than 500. I am wondering if we did something wrong?

Thanks!

----------update---------- I ran another set of powercurve without the belong term, the result is much better like this:

Power for predictor 'WTRbyET', (95% confidence interval),
by largest value of WTRbyET:
-0.796674530839672:  0.00% ( 0.00,  0.37) - 3 rows
-0.245276290683344:  0.00% ( 0.00,  0.37) - 114 rows
-0.0732327588773756:  0.00% ( 0.00,  0.37) - 225 rows
0.0905677112915569: 31.00% (28.14, 33.97) - 1335 rows
0.238326355757113: 47.60% (44.47, 50.75) - 1446 rows
0.378485806534831: 66.60% (63.58, 69.52) - 1556 rows
0.559574995310969: 79.20% (76.55, 81.68) - 1667 rows
0.756876323826524: 93.20% (91.46, 94.68) - 1778 rows
0.97718340782037: 98.40% (97.41, 99.08) - 1889 rows
1.97427623044644: 99.80% (99.28, 99.98) - 2000 rows

Dose it means that for reaching around 80% power, I need 1667 rows, which means 834 participants? Thanks!

pitakakariki commented 5 years ago

For your first power curve, if you have a large effect size then it makes sense that you don't need a large sample size. You also shouldn't be using an observed effect size here - you need to think about what effect size you need your study to detect.

I don't think the second power curve makes sense - by default along uses the first x-variable, in this case WTRbyET. along=Subject makes much more sense.

Finally, it sounds like you're planning of running a large study with human participants, in which case you probably need to consult with a statistician.