vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
466 stars 47 forks source link

NA values for SE/CI for pairwise-comparisons when using lme4, but not for glmmTMB #811

Closed strengejacke closed 1 year ago

strengejacke commented 1 year ago

Maybe a follow-up to #810. The following examples where I use hypothesis = "pairwise" work for glmmTMB (with too small standard errors), but not for lmer models (returning NA).

library(easystats)
library(lme4)
data(qol_cancer)

qol_cancer <- qol_cancer |>
  # recode age into three groups
  categorize(select = "age", split = "quantile", n_groups = 3) |>
  # numeric to factors, set labels as levels
  to_factor(select = c("hospital", "education", "age"))

qol_cancer$strata <- ifelse(
  is.na(qol_cancer$hospital) | is.na(qol_cancer$education) | is.na(qol_cancer$age),
  NA_character_,
  paste0(qol_cancer$hospital, ", ", qol_cancer$education, ", ", qol_cancer$age)
)
qol_cancer$strata <- factor(qol_cancer$strata)

m_null <- lmer(QoL ~ 1 + (1 | strata), data = qol_cancer)
dg <- marginaleffects::datagrid(model = m_null, by = "strata")

marginaleffects::predictions(
  m_null,
  newdata = dg
)
#> 
#>  Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 %     strata
#>      73.3       2.05 35.8   <0.001  69.3   77.3 0, high, 1
#>      69.3       2.05 33.9   <0.001  65.3   73.3 0, high, 2
#>      79.2       2.05 38.7   <0.001  75.2   83.2 0, high, 3
#>      71.5       2.05 34.9   <0.001  67.5   75.5 0, mid, 1 
#>      72.6       2.05 35.5   <0.001  68.6   76.6 0, mid, 2 
#>      77.2       2.05 37.8   <0.001  73.2   81.2 0, mid, 3 
#>      78.6       2.05 38.4   <0.001  74.6   82.7 1, high, 1
#>      80.4       2.05 39.3   <0.001  76.4   84.4 1, high, 2
#>      78.5       2.05 38.4   <0.001  74.5   82.5 1, high, 3
#>      66.8       2.05 32.6   <0.001  62.8   70.8 1, low, 1 
#>      68.9       2.05 33.7   <0.001  64.8   72.9 1, low, 2 
#>      65.6       2.05 32.1   <0.001  61.6   69.7 1, low, 3 
#>      69.7       2.05 34.1   <0.001  65.7   73.7 1, mid, 1 
#>      75.8       2.05 37.0   <0.001  71.8   79.8 1, mid, 2 
#>      76.6       2.05 37.4   <0.001  72.6   80.6 1, mid, 3 
#> 
#> Columns: rowid, estimate, std.error, statistic, p.value, conf.low, conf.high, QoL, strata

marginaleffects::avg_predictions(
  m_null,
  by = "strata"
)
#> 
#>      strata Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 %
#>  1, mid, 1      69.7       2.05 34.1   <0.001  65.7   73.7
#>  1, mid, 2      75.8       2.05 37.0   <0.001  71.8   79.8
#>  1, mid, 3      76.6       2.05 37.4   <0.001  72.6   80.6
#>  1, high, 3     78.5       2.05 38.4   <0.001  74.5   82.5
#>  1, high, 2     80.4       2.05 39.3   <0.001  76.4   84.4
#>  1, low, 1      66.8       2.05 32.6   <0.001  62.8   70.8
#>  1, low, 3      65.6       2.05 32.1   <0.001  61.6   69.7
#>  0, mid, 1      71.5       2.05 34.9   <0.001  67.5   75.5
#>  0, high, 1     73.3       2.05 35.8   <0.001  69.3   77.3
#>  1, low, 2      68.9       2.05 33.7   <0.001  64.8   72.9
#>  1, high, 1     78.6       2.05 38.4   <0.001  74.6   82.7
#>  0, high, 3     79.2       2.05 38.7   <0.001  75.2   83.2
#>  0, high, 2     69.3       2.05 33.9   <0.001  65.3   73.3
#>  0, mid, 2      72.6       2.05 35.5   <0.001  68.6   76.6
#>  0, mid, 3      77.2       2.05 37.8   <0.001  73.2   81.2
#> 
#> Columns: rowid, strata, estimate, std.error, statistic, p.value, conf.low, conf.high, rowid_dedup

# NA for SE and CI
marginaleffects::predictions(
  m_null,
  newdata = dg,
  hypothesis = "pairwise"
)
#> 
#>             Term Estimate Std. Error  z Pr(>|z|) 2.5 % 97.5 %
#>  Row 1 - Row 2      3.969         NA NA       NA    NA     NA
#>  Row 1 - Row 3     -5.934         NA NA       NA    NA     NA
#>  Row 1 - Row 4      1.805         NA NA       NA    NA     NA
#>  Row 1 - Row 5      0.661         NA NA       NA    NA     NA
#>  Row 1 - Row 6     -3.969         NA NA       NA    NA     NA
#> --- 95 rows omitted. See ?avg_predictions and ?print.marginaleffects --- 
#>  Row 12 - Row 14  -10.122         NA NA       NA    NA     NA
#>  Row 12 - Row 15  -10.951         NA NA       NA    NA     NA
#>  Row 13 - Row 14   -6.067         NA NA       NA    NA     NA
#>  Row 13 - Row 15   -6.896         NA NA       NA    NA     NA
#>  Row 14 - Row 15   -0.829         NA NA       NA    NA     NA
#> Columns: term, estimate, std.error, statistic, p.value, conf.low, conf.high

# NA for SE and CI
marginaleffects::avg_predictions(
  m_null,
  by = "strata",
  hypothesis = "pairwise"
)
#> 
#>                    Term Estimate Std. Error         z Pr(>|z|) 2.5 % 97.5 %
#>  1, mid, 1 - 1, mid, 2     -6.07   3.95e-12 -1.54e+12   <0.001 -6.07  -6.07
#>  1, mid, 1 - 1, mid, 3     -6.90         NA        NA       NA    NA     NA
#>  1, mid, 1 - 1, high, 3    -8.79         NA        NA       NA    NA     NA
#>  1, mid, 1 - 1, high, 2   -10.71         NA        NA       NA    NA     NA
#>  1, mid, 1 - 1, low, 1      2.91         NA        NA       NA    NA     NA
#> --- 95 rows omitted. See ?print.marginaleffects --- 
#>  0, high, 3 - 0, mid, 2     6.60         NA        NA       NA    NA     NA
#>  0, high, 3 - 0, mid, 3     1.97         NA        NA       NA    NA     NA
#>  0, high, 2 - 0, mid, 2    -3.31         NA        NA       NA    NA     NA
#>  0, high, 2 - 0, mid, 3    -7.94         NA        NA       NA    NA     NA
#>  0, mid, 2 - 0, mid, 3     -4.63         NA        NA       NA    NA     NA
#> Columns: term, estimate, std.error, statistic, p.value, conf.low, conf.high

m_null <- lmer(QoL ~ 1 + (1 | hospital:education:age), data = qol_cancer)
dg <- marginaleffects::datagrid(model = m_null, by = c("hospital", "education", "age"))

marginaleffects::predictions(
  m_null,
  newdata = dg
)
#> 
#>  Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 % hospital education age
#>      66.8       2.05 32.6   <0.001  62.8   70.8        1      low    1
#>      71.5       2.05 34.9   <0.001  67.5   75.5        0      mid    1
#>      69.7       2.05 34.1   <0.001  65.7   73.7        1      mid    1
#>      73.3       2.05 35.8   <0.001  69.3   77.3        0      high   1
#>      78.6       2.05 38.4   <0.001  74.6   82.7        1      high   1
#>      68.9       2.05 33.7   <0.001  64.8   72.9        1      low    2
#>      72.6       2.05 35.5   <0.001  68.6   76.6        0      mid    2
#>      75.8       2.05 37.0   <0.001  71.8   79.8        1      mid    2
#>      69.3       2.05 33.9   <0.001  65.3   73.3        0      high   2
#>      80.4       2.05 39.3   <0.001  76.4   84.4        1      high   2
#>      65.6       2.05 32.1   <0.001  61.6   69.7        1      low    3
#>      77.2       2.05 37.8   <0.001  73.2   81.2        0      mid    3
#>      76.6       2.05 37.4   <0.001  72.6   80.6        1      mid    3
#>      79.2       2.05 38.7   <0.001  75.2   83.2        0      high   3
#>      78.5       2.05 38.4   <0.001  74.5   82.5        1      high   3
#> 
#> Columns: rowid, estimate, std.error, statistic, p.value, conf.low, conf.high, QoL, hospital, education, age

marginaleffects::avg_predictions(
  m_null,
  by = c("hospital", "education", "age")
)
#> 
#>  hospital education age Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 %
#>         1      mid    1     69.7       2.05 34.1   <0.001  65.7   73.7
#>         1      mid    2     75.8       2.05 37.0   <0.001  71.8   79.8
#>         1      mid    3     76.6       2.05 37.4   <0.001  72.6   80.6
#>         1      high   3     78.5       2.05 38.4   <0.001  74.5   82.5
#>         1      high   2     80.4       2.05 39.3   <0.001  76.4   84.4
#>         1      low    1     66.8       2.05 32.6   <0.001  62.8   70.8
#>         1      low    3     65.6       2.05 32.1   <0.001  61.6   69.7
#>         0      mid    1     71.5       2.05 34.9   <0.001  67.5   75.5
#>         0      high   1     73.3       2.05 35.8   <0.001  69.3   77.3
#>         1      low    2     68.9       2.05 33.7   <0.001  64.8   72.9
#>         1      high   1     78.6       2.05 38.4   <0.001  74.6   82.7
#>         0      high   3     79.2       2.05 38.7   <0.001  75.2   83.2
#>         0      high   2     69.3       2.05 33.9   <0.001  65.3   73.3
#>         0      mid    2     72.6       2.05 35.5   <0.001  68.6   76.6
#>         0      mid    3     77.2       2.05 37.8   <0.001  73.2   81.2
#> 
#> Columns: rowid, hospital, education, age, estimate, std.error, statistic, p.value, conf.low, conf.high, rowid_dedup

# NA for SE and CI
marginaleffects::predictions(
  m_null,
  newdata = dg,
  hypothesis = "pairwise"
)
#> 
#>             Term Estimate Std. Error  z Pr(>|z|) 2.5 % 97.5 %
#>  Row 1 - Row 2     -4.677         NA NA       NA    NA     NA
#>  Row 1 - Row 3     -2.911         NA NA       NA    NA     NA
#>  Row 1 - Row 4     -6.482         NA NA       NA    NA     NA
#>  Row 1 - Row 5    -11.856         NA NA       NA    NA     NA
#>  Row 1 - Row 6     -2.067         NA NA       NA    NA     NA
#> --- 95 rows omitted. See ?avg_predictions and ?print.marginaleffects --- 
#>  Row 12 - Row 14   -1.965         NA NA       NA    NA     NA
#>  Row 12 - Row 15   -1.247         NA NA       NA    NA     NA
#>  Row 13 - Row 14   -2.609         NA NA       NA    NA     NA
#>  Row 13 - Row 15   -1.891         NA NA       NA    NA     NA
#>  Row 14 - Row 15    0.718         NA NA       NA    NA     NA
#> Columns: term, estimate, std.error, statistic, p.value, conf.low, conf.high

# NA for SE and CI
marginaleffects::avg_predictions(
  m_null,
  by = c("hospital", "education", "age"),
  hypothesis = "pairwise"
)
#> 
#>                    Term Estimate Std. Error         z Pr(>|z|) 2.5 % 97.5 %
#>  1, mid, 1 - 1, mid, 2     -6.07   3.95e-12 -1.54e+12   <0.001 -6.07  -6.07
#>  1, mid, 1 - 1, mid, 3     -6.90         NA        NA       NA    NA     NA
#>  1, mid, 1 - 1, high, 3    -8.79         NA        NA       NA    NA     NA
#>  1, mid, 1 - 1, high, 2   -10.71         NA        NA       NA    NA     NA
#>  1, mid, 1 - 1, low, 1      2.91         NA        NA       NA    NA     NA
#> --- 95 rows omitted. See ?print.marginaleffects --- 
#>  0, high, 3 - 0, mid, 2     6.60         NA        NA       NA    NA     NA
#>  0, high, 3 - 0, mid, 3     1.97         NA        NA       NA    NA     NA
#>  0, high, 2 - 0, mid, 2    -3.31         NA        NA       NA    NA     NA
#>  0, high, 2 - 0, mid, 3    -7.94         NA        NA       NA    NA     NA
#>  0, mid, 2 - 0, mid, 3     -4.63         NA        NA       NA    NA     NA
#> Columns: term, estimate, std.error, statistic, p.value, conf.low, conf.high

Created on 2023-06-09 with reprex v2.0.2

vincentarelbundock commented 1 year ago

This is weird. I can't figure out what is happening, but one hint is that things seem to work OK when the model includes another (fixed effect) term in addition to the intercept. The NA mostly seem to arise when the model has random effects plus just an intercept.

Perhaps there is something wrong in set_coef() in R/methods_lme4.R, but I can't see it now...

vincentarelbundock commented 1 year ago

OK, so I’ve investigated and concluded that this is not a bug, but rather a logical/mathematical constraint of the model with a single intercept and no regressor.

Recall that to compute the standard errors we first need to get a Jacobian, where each column represents the vector of partial derivatives with respect to each parameter. For example:

library(lme4)
library(marginaleffects)
m <- lmer(Sepal.Length ~ 1 + (1 | Species), data = iris)

avg_predictions(m)
# 
#  Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
#      5.84      0.459 12.7   <0.001 120.9  4.94   6.74
# 
# Columns: estimate, std.error, statistic, p.value, s.value, conf.low, conf.high

avg_predictions(m, by = "Species") |> attr("jacobian")
#      (Intercept)
# [1,]           1
# [2,]           1
# [3,]           1

The Jacobian shows that increasing the first parameter by 1 increases each of the predictions in each row by 1. Trivial…

Now consider what happens to the Jacobian when we do pairwise comparisons:

avg_predictions(m, by = "Species", hypothesis = "pairwise")
# 
#                    Term    Species Estimate Std. Error         z Pr(>|z|)   S
#  setosa - versicolor    setosa       -0.922         NA        NA       NA  NA
#  setosa - virginica     versicolor   -1.569   6.98e-13 -2.25e+12   <0.001 Inf
#  versicolor - virginica virginica    -0.647   6.98e-13 -9.27e+11   <0.001 Inf
#   2.5 % 97.5 %
#      NA     NA
#  -1.569 -1.569
#  -0.647 -0.647
# 
# Columns: rowid, term, Species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, rowid_dedup

avg_predictions(m, by = "Species", hypothesis = "pairwise") |> attr("jacobian")
#        (Intercept)
# [1,]  0.000000e+00
# [2,] -1.519986e-12
# [3,] -1.519986e-12

When we increase the (Intercept) parameter by 1 unit, none of the comparisons change. This is because each prediction changes by the exact same amount, so the increase in the difference between any two predictions is zero.

Since all entries in the Jacobian are 0, then doing J’VJ will also yield 0 standard errors, and some NA entries.

So this is not a bug. It is just a non-sensical query to ask for standard errors for pairwise comparisons between predictions in a model with only an intercept.

Makes sense?

The reason we don’t get 0 results for glmmTMB is that the Jacobian also includes the variance parameters. But given Issue #810, I’m not sure I trust that.

library(glmmTMB)
m <- glmmTMB(Sepal.Length ~ 1 + (1 | Species), data = iris)

avg_predictions(m, by = "Species", hypothesis = "pairwise") |> attr("jacobian")
#        (Intercept) d~(Intercept) theta_1|Species.1
# [1,]  2.025964e-11   0.011549827       -0.02309817
# [2,] -2.025964e-11   0.019647125       -0.03929173
# [3,] -4.051928e-11   0.008097298       -0.01619356
strengejacke commented 1 year ago

Thanks for looking into this!

Makes sense?

Yes, at least for fixed effects. I was hoping that when random effect parameters are of interest in pairwise comparisons, SEs for those are taken into account. When we do predictions, we have SEs/CIs for those predictions:

library(easystats)
library(lme4)
data(qol_cancer)

qol_cancer <- qol_cancer |>
  # recode age into three groups
  categorize(select = "age", split = "quantile", n_groups = 3) |>
  # numeric to factors, set labels as levels
  to_factor(select = c("hospital", "education", "age"))

qol_cancer$strata <- ifelse(
  is.na(qol_cancer$hospital) | is.na(qol_cancer$education) | is.na(qol_cancer$age),
  NA_character_,
  paste0(qol_cancer$hospital, ", ", qol_cancer$education, ", ", qol_cancer$age)
)
qol_cancer$strata <- factor(qol_cancer$strata)

m_null <- lmer(QoL ~ 1 + (1 | strata), data = qol_cancer)
dg <- marginaleffects::datagrid(model = m_null, by = "strata")

marginaleffects::plot_predictions(
  m_null,
  newdata = dg,
  by = "strata"
)

By visual inspection, I would guess that some of these predictions differ significantly from each other (point estimate not included in CI of some other predictions), while some seem to be more or less "equivalent" (point estimates covered by CIs of other predictions).

My hope was that these SEs/CIs from predictions can be used for comparisons...

vincentarelbundock commented 1 year ago

Yeah, I see what you mean. It's unfortunate, but frankly I don't have much hope of being able to solve this in the short or medium run...

strengejacke commented 1 year ago

Yes, no problem. I just hoped it would have been as easy as "we have standard errors, we can do comparisons" 😉

DrJerryTAO commented 1 year ago

By visual inspection, I would guess that some of these predictions differ significantly from each other (point estimate not included in CI of some other predictions), while some seem to be more or less "equivalent" (point estimates covered by CIs of other predictions).

Unfortunately "point estimate not included in CI of some other predictions" does not suggest significant difference, neither does "point estimates covered by CIs of other predictions" establish equivalence. For group comparisons, you would need to adjust the standard errors pair by pair. See Tryon, W. W. (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests. Psychological Methods, 6(4), 371–386. https://doi.org/10.1037/1082-989X.6.4.371

Since all group means share the same standard error SE = 2.05 (as the parameter uncertainty comes only from the fixed effect of the intercept, while random effects only deviate the group mean by the estimated random intercept) and are independent from each other, this would mean that we scale all SE by a factor of sqrt(2)/2 to shrink each CI length to its 70.7%. Nonoverlapping shrunk CIs of two groups suggest significant difference. If using a t distribution, you could also adjust the degree of freedom for unequal variances.