pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation
69 stars 19 forks source link

convergence warnings when running power simulation #169

Open iris1293 opened 4 years ago

iris1293 commented 4 years ago

Hi there,

I'm trying to run a power analysis on simulated data with values based on a pilot study.

So this is the formula I used to determine these values with the pilot data:

model1f <- lmer(PainRating ~ 1 + TrialTotal + (1+TrialTotal|pp_code), data=DataSet,control=lmerControl(optCtrl = list(maxfun = 1e+9),optimizer="bobyqa"))

Where the painrating is ratings on a 0-100 scale and trialtotal just the trialnumbers from 0-125. It only converges when I increase the iterations and use bobyqa as an optimizer, probably because the painrating is not normally distributed. However, I do want to use the lmer instead of glmer because of the interpretability of the data, because it makes it easier to estimate the different types of effects I want to estimate for the power analyses.

So I create my simulated dataset as follows:

pp_code <- factor(1:30) TrialTotal <- 0:125 Group <- 0:2 Block <- 0:2 pp_code <- rep(pp_code, each=126) TrialTotal <- rep(TrialTotal, 30) Block <- rep(rep(Block, each=42), 30) Group <- rep(rep(Group, each=126), 10) covars <- data.frame(pp_code=pp_code, TrialTotal=TrialTotal, Group=factor(Group), Block=factor(Block))

So here I added a group variable with 3 different groups. This is not in the pilot data, because we only piloted the control condition.

I then created the following model:

model3a <- makeLmer(PainRating.t ~ TrialTotal*Group+(1+TrialTotal|pp_code), fixef=fixed, VarCorr=vars2, sigma=res, data=covars)

Based on the data from the pilots, with an estimated effect size for an interaction between TrialTotal:Group2 of .05130.

Linear mixed model fit by REML ['lmerMod'] Formula: PainRating.t ~ TrialTotal * Group + (1 + TrialTotal | pp_code) Data: covars

REML criterion at convergence: 37039.9

Scaled residuals: Min 1Q Median 3Q Max -3.9220 -0.6887 -0.0052 0.6738 3.5562

Random effects: Groups Name Variance Std.Dev. Corr pp_code (Intercept) 506.69266 22.5098
TrialTotal 0.03367 0.1835 -0.59 Residual 268.82882 16.3960
Number of obs: 3780, groups: pp_code, 30

Fixed effects: Estimate Std. Error t value (Intercept) 27.20332 7.17723 3.790 TrialTotal -0.05915 0.05940 -0.996 Group1 0.00020 10.15013 0.000 Group2 0.00040 10.15013 0.000 TrialTotal:Group1 0.00030 0.08400 0.004 TrialTotal:Group2 0.05130 0.08400 0.611

Correlation of Fixed Effects: (Intr) TrlTtl Group1 Group2 TrT:G1 TrialTotal -0.593
Group1 -0.707 0.419
Group2 -0.707 0.419 0.500
TrlTtl:Grp1 0.419 -0.707 -0.593 -0.297
TrlTtl:Grp2 0.419 -0.707 -0.297 -0.593 0.500

However, when I then run the power analysis (I usually do 1000 simulations, but this was to check if it worked):

p_group <- powerSim(model3a, nsim=10, test = fcompare(PainRating~TrialTotal+Group))

It gives me all warnings, and they all say the model didn't converge.

Doesn't it converge because it needs the extra iterations and bobyqa as an optimizer? If so, is it possible to add that to the model or a way to by-pass it? Or could there be another reason for the convergence warnings?

Thanks in advance!

pitakakariki commented 4 years ago

I can't give you much help on convergence past what's supplied by ?convergence.

If you think the extra iterations will help, you could specify control arguments in the fitOpts argument to powerSim (although this has only had limited testing).

philturk commented 4 years ago

I know this was posted a couple of months ago, but someone else might run into the same issue. More specifically try:

p_group <- powerSim(model3a, nsim=10, test = fcompare(PainRating~TrialTotal+Group),
                    fitOpts = list(control = lmerControl(optCtrl = list(maxfun = 2e5), optimizer = "bobyqa")))

I reduced the maximum number of function evaluations from your example.

Thanks for a great package, Peter.

TheDom42 commented 4 years ago

I'm basically running into the same issue performing a post-hoc power analysis on an otherwise stable lmer model (i.e. all of the assumptions for the model fit are met and fitting the model using lmer works without convergence error).

Model is specified this way: m <- lmer(DV ~ factor(blockID.c) * condition * blockDiff + (poly(blockID.c, 4)|VP), data=df)

When running a powerSim to evaluate the power to find the interaction, using this call power <- powerSim(m, test = fcompare(~ factor(blockID.c)+condition+blockDiff, "kr")), the progress messages tell me about singular fits and the summary also tells me about warnings:

Power for model comparison, (95% confidence interval):
      41.60% (38.52, 44.73)

Test: Kenward-Roger (package pbkrtest)
      Comparison to ~factor(blockID.c) + condition + blockDiff + [re]

Based on 1000 simulations, (757 warnings, 0 errors)
alpha = 0.05, nrow = 288

Time elapsed: 1 h 5 m 46 s

nb: result might be an observed power calculation

Calling power$warnings also tells me about many convergence errors:

     stage index                                                                         message
1  Testing     1         Model failed to converge with 2 negative eigenvalues: -4.5e-04 -9.4e-01
2  Fitting     2 Model failed to converge with max|grad| = 0.00254957 (tol = 0.002, component 1)
3  Testing     2  Model failed to converge with max|grad| = 0.0046813 (tol = 0.002, component 1)
4  Testing     3                   Model failed to converge with 1 negative eigenvalue: -4.6e-02
5  Fitting     4                                              unable to evaluate scaled gradient
6  Fitting     4       Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
7  Fitting     4                   Model failed to converge with 1 negative eigenvalue: -1.7e-02
8  Fitting     6                   Model failed to converge with 1 negative eigenvalue: -3.9e-02
9  Testing     6 Model failed to converge with max|grad| = 0.00246762 (tol = 0.002, component 1)
10 Testing     7                   Model failed to converge with 1 negative eigenvalue: -1.5e-01

I also tried the solution proposed by @philturk but this did not seem to solve the problem in my case. I'm just a bit unsure whether to just ignore the warnings (power seems sensible in my case) or if I would need to do some further digging.

pitakakariki commented 4 years ago

My first guess is your model is just too compex to be reliably estimated from your study design.

If you want to dig deeper, you could try:

set.seed(1)
summary(doFit(doSim(m), m))

You might need to try a handful of random number seeds until you get one that doesn't converge.

TheDom42 commented 4 years ago

Thank you very much for the swift reply! And great package of course.

I was just a little surprised because regular lmer had no problems estimating it. Does this mean that I was probably just lucky with the lmer estimation as I'm somewhere on the borderline of being estimable?

More importantly: even though, so many errors occured - can I trust the results or do they directly depend on the number of successful estimations? I have seen in the summary print for this power analysis that there were 416 of 1000 successful estimations (which would be exactly the power calculated), so I'm unsure about trusting the result.

pitakakariki commented 4 years ago

Since simr is simulating and refitting with the same model (at least in the "fitting" phase) then technically we know that the we're fitting the "true" model in these cases. So it's possible your data just happened to converge because you were lucky.

The successes column in summary is just the number of true positives during the simulation (i.e. the power which is why it matches exactly). If you want the number of simulations that produced warnings you could try length(unique(power$warnings$index)).

You could also extract the p-values with `power$pval$ if you want to see if the false negatives and convergence warnings line up.

In general I wouldn't necessarily trust the results with this many warnings. I'd read them as "if I ran this experiment again there's a high chance I'd get convergence errors next time". That could potentally wreck the frequentist properties of your inference depending on how you'd handle that situation (e.g. by refitting with a simpler model).

It does to some extent depand on why the models aren't converging. I get a lot of boundary estimates nowadays because I'll control for e.g. (1|day) in a lab trial, but the experimenters have controlled the setting so well that the variation over several days in negligible.