strengejacke / sjstats

Effect size measures and significance tests
https://strengejacke.github.io/sjstats
189 stars 21 forks source link

`lmer` p-values: discrepancy between lmerTest and sjstats with identical standard errors #38

Closed IndrajeetPatil closed 6 years ago

IndrajeetPatil commented 6 years ago

In their "Keep it maximal" paper (http://talklab.psy.gla.ac.uk/KeepItMaximalR2.pdf), Barr and colleagues tabulate different ways one can compute p-values for lmer models:

There are various ways to obtain p-values from LMEMs, and to our knowledge, there is little agreement on which method to use. Therefore, we considered three methods currently in practice: (1) treating the t-statistic as if it were a z statistic (i.e., using the standard normal distribution as a reference); (2) performing likelihood ratio tests, in which the deviance (−2LL) of a model containing the fixed effect is compared to another model without it but that is otherwise identical in random effects structure; and (3) by Markov Chain Monte Carlo (MCMC) sampling, using the mcmcsamp() function of lme4 with 10000 iterations.

If I am not mistaken, both lmerTest and sjstats use the first approach to compute the p-values for linear mixed-effects models?

What I am confused by is why the p-values are different when computed with lmerTest as compared to when computed with sjstats, especially when p.kr = FALSE and the standard errors for estimates from these two functions (lmerTest::as_lmerModLmerTest and sjstats::p_value) are identical. I'm a bit worried because the differences are pretty big, with p-values sometimes going from p < 0.01 to p < 0.001. I read the details for the sjstats function, but still couldn't come up with an explanation.


# libraries needed
library(lme4)
library(lmerTest)
library(tibble)
library(sjstats)
library(ggstatsplot)

# two different models
mod1 <-
  lme4::lmer(
    formula = scale(rating) ~ scale(budget) + (budget |
      genre),
    REML = FALSE,
    data = ggstatsplot::movies_long
  )
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.0041235
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?

mod2 <-
  lme4::lmer(
    formula = scale(rating) ~ scale(budget) + scale(length) + (budget + length |
      genre),
    REML = FALSE,
    data = ggstatsplot::movies_long
  )
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 1.53644 (tol
#> = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?

# p-values with lmerTest
coef(summary(lmerTest::as_lmerModLmerTest(mod1))) %>%
  tibble::as_data_frame(x = .)
#> # A tibble: 2 x 5
#>   Estimate `Std. Error`    df `t value` `Pr(>|t|)`
#>      <dbl>        <dbl> <dbl>     <dbl>      <dbl>
#> 1  -0.0165       0.107   6.34    -0.155      0.882
#> 2   0.0765       0.0413  5.92     1.85       0.114

coef(summary(lmerTest::as_lmerModLmerTest(mod2))) %>%
  tibble::as_data_frame(x = .)
#> # A tibble: 3 x 5
#>   Estimate `Std. Error`    df `t value` `Pr(>|t|)`
#>      <dbl>        <dbl> <dbl>     <dbl>      <dbl>
#> 1   0.0202       0.0819  5.54     0.247     0.814 
#> 2  -0.0781       0.0317  4.81    -2.47      0.0587
#> 3   0.344        0.0831  4.73     4.14      0.0101

# without KR approx
sjstats::p_value(fit = mod1, p.kr = FALSE) %>%
  tibble::as_data_frame(x = .)

#> Computing p-values via Wald-statistics approximation (treating t as Wald z).
#> # A tibble: 2 x 3
#>   term          p.value std.error
#>   <fct>           <dbl>     <dbl>
#> 1 (Intercept)    0.877     0.107 
#> 2 scale(budget)  0.0642    0.0413

sjstats::p_value(fit = mod2, p.kr = FALSE) %>%
  tibble::as_data_frame(x = .)

#> Computing p-values via Wald-statistics approximation (treating t as Wald z).
#> # A tibble: 3 x 3
#>   term            p.value std.error
#>   <fct>             <dbl>     <dbl>
#> 1 (Intercept)   0.805        0.0819
#> 2 scale(budget) 0.0137       0.0317
#> 3 scale(length) 0.0000352    0.0831

# p-values with sjstats
# KR approx

sjstats::p_value(fit = mod1, p.kr = TRUE) %>%
  tibble::as_data_frame(x = .)

#> Computing p-values via Kenward-Roger approximation. Use `p.kr = FALSE` if computation takes too long.
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.0033352
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.0033352
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.0033352
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?

#> # A tibble: 2 x 3
#>   term          p.value std.error
#>   <fct>           <dbl>     <dbl>
#> 1 (Intercept)     0.893    0.118 
#> 2 scale(budget)   0.167    0.0485

sjstats::p_value(fit = mod2, p.kr = TRUE) %>%
  tibble::as_data_frame(x = .)

#> Computing p-values via Kenward-Roger approximation. Use `p.kr = FALSE` if computation takes too long.
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.672962
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.672962
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.672962
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
#> control$checkConv, : Model failed to converge with max|grad| = 0.672962
#> (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?

#> # A tibble: 3 x 3
#>   term          p.value std.error
#>   <fct>           <dbl>     <dbl>
#> 1 (Intercept)   0.832      0.0912
#> 2 scale(budget) 0.102      0.0387
#> 3 scale(length) 0.00760    0.101

Created on 2018-07-16 by the reprex package (v0.2.0).

strengejacke commented 6 years ago

I think lmerTest uses Satterthwaite approximation for df by default. sjstats only the "quick" solution or Kernward-Roger Approximation.

IndrajeetPatil commented 6 years ago

Thanks. That explains the differences between outputs when p.kr = TRUE.

But why are results different when p.kr = FALSE. In that case, Kernward-Roger approximation is not being used, right? Why are p-values different in that case?

strengejacke commented 6 years ago

Because p_value() calculates Wald, while lmerTest calculates Satterthwaite. See ? lmerTest::summary for options.

strengejacke commented 6 years ago

I think that if you pass a lmerModlmerTest model to p_value() it returns the same p-values as for summary() (can't check here, sorry for short answers, using the mobile phone here).