pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation
69 stars 19 forks source link

inconsistent results with effectively identical model formulae using fcompare vs. compare #153

Open ahafri opened 5 years ago

ahafri commented 5 years ago

I am puzzled by differences in formulae that outside of simr should be effectively identical and result in identical models.

This is the original model formula:

lm.7.p <- lmer(RT_inv ~ 1 + exp_version + verb_telicity * motion_boundedness + verb_domain * motion_boundedness +
                 (1 + verb_telicity | subjNum) + 
                 (1 | verb), 
               data=data.trial.w, 
               REML=F)

Comparisons to a simpler model without the verb_telicity * motion_boundedness interaction outside of simr (using lmer for fits and comparisons) results in a fit like the below:

         Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
lm.simple 13 79630 79717 -39802    79604                         
lm.7.p    14 79627 79720 -39800    79599 5.0684      1    0.02437 *

This is true whether the simpler model is specified with the explicit main effect term motion_boundedness or whether it's implied by the asterisk in the formula notation: RT_inv ~ 1 + exp_version + verb_telicity + verb_domain * motion_boundedness vs. RT_inv ~ 1 + exp_version + verb_telicity + motion_boundedness + verb_domain * motion_boundedness i.e., the output for model fits using lmer/anova is the same regardless of the above two specifications.

But in simr, these result in different comparison results (which carry over into the power analyses). But when I use the full "compare" function, this doesn't seem to be an issue (i.e. both versions of the formula give the same output), making me think it's something with how simr (and in particular, fcompare) parses formulae:

> # fine: without motion_boundedness, using fcompare
> doTest(lm.7.p, 
+        nsim=20, 
+        test = fcompare(RT_inv ~ exp_version + verb_telicity + verb_domain * motion_boundedness))
p-value for model comparison: 0.02436587
          --------------------
Test: Likelihood ratio
      Comparison to RT_inv ~ exp_version + verb_telicity + verb_domain * motion_boundedness + [re]
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00962303 (tol = 0.002, component 1)
> # error/inconsistency WITH extra motion_boundedness (using fcompare)
> doTest(lm.7.p, 
+        nsim=20, 
+        test = fcompare(RT_inv ~ exp_version + verb_telicity + motion_boundedness + verb_domain * motion_boundedness))
p-value for model comparison: 1
          --------------------
Test: Likelihood ratio
      Comparison to RT_inv ~ exp_version + verb_telicity + motion_boundedness + verb_domain *  + [re]
Warning messages:
1: In description[2] <- str_c("Comparison to ", deparse(formula(model)),  :
  number of items to replace is not a multiple of replacement length
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 3 negative eigenvalues
> # fine: full compare WITH extra motion_boundedness
> doTest(lm.7.p, 
+        nsim=20, 
+        test = compare(RT_inv ~ 1 + exp_version + verb_telicity + motion_boundedness + verb_domain * motion_boundedness + 
+                         (1 + verb_telicity | subjNum) + 
+                         (1 | verb)))
p-value for model comparison: 0.02436587
          --------------------
Test: Likelihood ratio
      Comparison to RT_inv ~ 1 + exp_version + verb_telicity + motion_boundedness + 
Warning messages:
1: In description[2] <- str_c("Comparison to ", deparse(formula(model))) :
  number of items to replace is not a multiple of replacement length
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00962303 (tol = 0.002, component 1)
> # fine: full compare withOUT extra motion_boundedness
> doTest(lm.7.p, 
+        nsim=20, 
+        test = compare(RT_inv ~ 1 + exp_version + verb_telicity + verb_domain * motion_boundedness + 
+                         (1 + verb_telicity | subjNum) + 
+                         (1 | verb)))
p-value for model comparison: 0.02436587
          --------------------
Test: Likelihood ratio
      Comparison to RT_inv ~ 1 + exp_version + verb_telicity + verb_domain * motion_boundedness + 
Warning messages:
1: In description[2] <- str_c("Comparison to ", deparse(formula(model))) :
  number of items to replace is not a multiple of replacement length
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00962303 (tol = 0.002, component 1)

Thanks for the help and for this awesome package!

pitakakariki commented 5 years ago

Thanks for the bug report.

It looks like the problem is with parsing the long formula - deparse is splitting it into a character vector of length 3 but simr is assuming it will get a single character string.

I'll add it to my list of things to fix. In the meantime, you can either shorten your variable names or use the workarounds you've already found.