Closed psyargh closed 6 years ago
There's 1000 warnings so that would be the best place to start. What's the output of head(lastResult()$warnings)
?
I also suspect that there's over-fitting - it looks like you have five parameters for only seven observations? That might be causing the model to understate the uncertainties. Are you able to share summary(modelA)
?
Thank you! Here's my model summary:
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: y ~ x1 + x2 + x3 + (1 | subject)
Data: mydata
AIC BIC logLik deviance df.resid
311.6 332.4 -149.8 299.6 232
Scaled residuals:
Min 1Q Median 3Q Max
-1.8416 -1.1585 0.4890 0.7327 1.2528
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 0.006061 0.07785
Residual 0.202040 0.44949
Number of obs: 238, groups: subject, 7
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.0289526 0.2588544 0.112
x1 0.0063143 0.0026297 2.401
x2 0.0443941 0.0362498 1.225
x3 -0.0007232 0.0037104 -0.195
Correlation of Fixed Effects:
(Intr) x1 x2
x1 -0.930
x2 -0.257 -0.025
x3 -0.218 0.043 0.008
This is the output of head(lastResult()$warnings)
:
> head(lastResult()$warnings)
stage index message
1 Fitting 1 extra argument(s) ‘na.rm’ disregarded
2 Testing 1 extra argument(s) ‘na.rm’ disregarded
3 Testing 1 extra argument(s) ‘na.rm’ disregarded
4 Fitting 2 extra argument(s) ‘na.rm’ disregarded
5 Testing 2 extra argument(s) ‘na.rm’ disregarded
6 Testing 2 extra argument(s) ‘na.rm’ disregarded
Thank you!
I think there are still a few situations where simr struggles with missing data. If there are NAs in mydata
, you should call mydata <- na.omit(mydata)
before passing it to lmer
. That should get rid of the warnings.
Could you please try fixef(x1) <- 0
and rerun the power calculation? Ideally you should get 5% power with a zero effect. If the power is still high with a zero effect, it will confirm that there is a problem with the model.
The subject random effect does look very small, which might point to identifiability issues. You may need to reset it with VarCorr(modelA) <- ???
, where the actual value will need to be based on either subject-area knowledge or similar studies from the literature.
Thank you! I appreciate you taking the time to help me run simr successfully!
In the meantime I realized I needed a binomial model, so I changed it. I did steps 1 & 2 like you suggested, but my power was around 14% when the fixed effect of x1 was set to 0. Does this mean that there's a mistake in the model? Here's my model:
mydata <- na.omit(mydata)
summary(modelA<-glmer(y ~ x1 + x2 + x3 + (1|subject), family = binomial, data=mydata, na.action = na.omit))
fixef(modelA) ["x1"] <- 0
powerSim(modelA)
Here's the model output:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
glmerMod]
Family: binomial ( logit )
Formula: y ~ x1 + x2 + x3 + (1 | subject)
Data: mydata
AIC BIC logLik deviance df.resid
272.9 290.0 -131.4 262.9 220
Scaled residuals:
Min 1Q Median 3Q Max
-3.4415 -0.9323 0.4700 0.7200 1.1684
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 0.2511 0.5011
Number of obs: 225, groups: subject, 7
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.012131 0.277416 3.648 0.000264 ***
x1 0.045856 0.017866 2.567 0.010267 *
x2 0.101643 0.187686 0.542 0.588123
x3 -0.003022 0.018993 -0.159 0.873585
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) x1 x2
x1 0.253
x2 -0.009 -0.049
x3 0.226 0.036 0.021
I think the subject random effect is not much more reasonable/expected.
There's definitely a mistake here somewhere, because when I do
modelB <- extend(modelA, along="x1", n=14)
powerSim(modelB)
I get even smaller power than when I use my original sample size (7)
The new model looks much better.
You might want to add an overdispersion term, i.e. you could add mydata$obs <- 1:220
and then include a (1|obs)
random effect in the model.
I'm not sure what x1
measures, so I'm not sure if it's the correct variable to extend along? If it's say "time" and extending the study means running it for longer, then that's fine. If the intent was to recruit more subjects you would extend along subject
.
I would try running two powerCurve
simulations, one with zero effect for x1
and one with 0.02. Hopefully the null power curve settles down to 5%. For smaller samples the power might be inflated because the nominal 5% type I error rate is exceeded.
Thanks, I did 1 & 2. Re: 3. I get the following error message:
Error in sprintf("%7i: ", x$xval[i]) : invalid format '%7i'; use format %f, %e, %g or %a for numeric objects
That bug's been fixed in the latest github version.
Thank you! I believe my model works accurately now!
Sorry -- another thing didn't work! When I used a different outcome variable, which has a negative regression slope, my power analysis completely changed. It's now indicating that my power to detect that slope is much lower than the power to detect the positive slope with the original outcome variable. Is there something I should do differently when I'm looking to detect negative slope? Thanks!
Actually, it doesn't seem to be only about the negative slope. I change the fixef
to be positive and I still get a much lower power for the new outcome variable. The new and old variable are very similar.
But to illustrate, for the old variable, when I extend along "subject" to n=80, the power is 84%, for the new variable, it's 16%.
This is all in the identical model, just slightly different variables.
I fixed a few bugs with binomial models over the weekend. Is your binomial model just 0's and 1's? If so it should be fine, otherwise you might need to download the latest github version.
It's not unusual to get different power for different variables, but 16% does sound low for 80 subjects.
If the new variable has a much smaller range, that can make it difficult to detect an effect. Another thing I see quite a bit is that a variable has very small within-subject variation, which can make it difficult to separate the subject effect from the effect of the variable.
Thank you! So what do you suggest I should do in order to trust this model (given that getting 16% power for 80 people sounds low)? My outcome variable has only 0s and 1s.
A handful of ideas:
Compare the two models, old and new outcome variables. Especially, how do the standard errors for the slope (relative to the slope) compare?
Plot one outcome variable against the other. Are they as similar as you expect?
If you email the data to simr.peter at gmail dot com, I might be able to have a look at it, although the next couple of months are looking quite busy so I can't promise anything.
Thank you! I'll try doing what you suggested and see what happens. You've already been very generous with your time-- much appreciated.
I believe though that one thing would make this analysis easier for me-- I have a hard time understanding unstandardized slope values of my predictors, thus making it hard to decide on the effect size I'm looking for. What's the best way to find out how these values correspond to standardized effect sizes?
Thank you!
Unfortunately in your case the effect sizes are odds ratios, and the general consensus is that no one really understands those!
I think the best way to get a sense of effect sizes in a binomial model is see how predicted probabilities are affected.
For example, if the predicted probability for a particular subject was 50%,, the effect size for x1
was 0.02, and you increased x1
by 1 unit, the new predicted probability would be 50.5%.
> plogis(qlogis(0.5) + 0.02)
[1] 0.5049998
Whether that's large or small would depend on the scale of x1
of course. e.g. if a typical change in x1
was 10 units, then 50% would become 55%:
> plogis(qlogis(0.5) + 0.02*10)
[1] 0.549834
Hi, Unfortunately I wasn't able to figured this out. I sent you an email to the address provided here! Thanks!
Hi,
I have another issue with this package and I'd love some help. I have some pilot data on 7 people and I used it to estimate power. However, even with such small sample, I get the power of 100%.
Here's my model:
modelA<-lmer(y ~ x1 + x2 + x3 + (1|subject), data=mydata, REML=FALSE, na.rm = TRUE)
x2 is number of days in which the participants took the study, and x3 is number of items they saw on each day. I'm interested in finding the effect of x1 on y, Cohen's d = 0.2 (not sure how to specify that?)
This is the output I get:
This seems rather impossible and I'm not sure where my mistake is. Thanks!