vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
470 stars 47 forks source link

`lmer()` random effects standard errors #874

Closed OmidGhasemi21 closed 1 year ago

OmidGhasemi21 commented 1 year ago

Hi @vincentarelbundock,

I am trying to extract by group marginal means in lmer() with group as a random effect and I got weird results. I appreciate your help.

library(marginaleffects)
library(lme4)
library(tidyverse)

data <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv") %>%
  drop_na()

model <- lmer(flipper_length_mm ~ bill_depth_mm + (bill_depth_mm | species),
           data = data)

I want to extract the marginal means (sorry for the loose definition here) by species (basically, what the coef() returns plus CI).

Here is my code with slopes(). I do not understand why I get similar SE for all levels of species and why the CI width is the same:

slopes(model,
       variables = "bill_depth_mm",
       newdata = datagrid(bill_depth_mm = c(13:22),
                          species = unique),
       by= c("species")) %>%
  tibble() %>%
  select(species, estimate, std.error, conf.low, conf.high) %>%
  mutate(ci_width= conf.high - conf.low)
# species   estimate std.error conf.low conf.high ci_width
# 1 Adelie        1.84     0.919   0.0332      3.64     3.60
# 2 Gentoo        4.74     0.919   2.94        6.55     3.60
# 3 Chinstrap     3.14     0.919   1.34        4.94     3.60

Is there a way to get by group estimates using marginaleffectswhen group is a random intercept?

PS. Using broom_mixed::tidy() and a bit of manual calculation gave me these values. The estimates are the same, but the CIs are different.

broom.mixed::tidy(model, effects = "ran_vals", conf.int = TRUE)%>%
  filter(term == "bill_depth_mm") %>%
  # add the fixed coefficient  to ranef values to get coefs
  mutate(estimate = estimate + 3.24,
         conf.low = conf.low + 3.24,
         conf.high = conf.high + 3.24)  %>%
  select(species=group, estimate, std.error, conf.low, conf.high) %>%
  mutate(ci_width= conf.high - conf.low)
# species estimate std.error conf.low conf.high ci_width
# 1 species     1.84     0.326     1.20      2.48     1.28
# 2 species     3.14     0.433     2.29      3.99     1.70
# 3 species     4.74     0.434     3.89      5.60     1.70
vincentarelbundock commented 1 year ago

Unfortunately, that's not possible at the moment, and I don't expect this to be improved in the near future.

The problem is that slopes() must use the lme4:predict() function. In contrast, broom.mixed::tidy() exploits features of the lme4::ranef() function that are not available in lme4::predict().

OmidGhasemi21 commented 1 year ago

Thanks for the clarification @vincentarelbundock. If I am correct, calculating by group estimates is not a problem with brms and marginaleffects (see below), right?

library(brms)
model_brms <- brm(flipper_length_mm ~ bill_depth_mm + (bill_depth_mm | species),
              data = data)

slopes(model_brms,
       newdata = datagrid(bill_depth_mm = c(13:22),
                          species = unique),
       re_formula = NULL) %>%
  posterior_draws() %>%
  group_by(species) %>%
  tidybayes::mean_hdi(draw)
# species    draw .lower .upper .width .point .interval
# 1 Adelie     1.97   1.20   2.64   0.95 mean   hdi      
# 2 Chinstrap  2.99   2.06   4.09   0.95 mean   hdi      
# 3 Gentoo     4.62   3.76   5.53   0.95 mean   hdi  

And regarding the frequentist approach, how do you suggest extracting such estimates? Are you aware of any packages that can do this? I guess emmeans (as specified here https://github.com/rvlenth/emmeans/issues/118) and easystats::estimate_slopes are not helpful here.

manually adding the fixed effect estimate to ranef() values (as I did with tidy() above) is an option, but I am not sure if the estimate should be added to CIs as well or not. Any ideas?

Another option is to include the coefficients in the fixed effect part as in:

model2 <- lm(flipper_length_mm ~ bill_depth_mm * species,
              data = data)

slopes(model2,
       variables= "bill_depth_mm",
       by= c("species")) 
# Term    Contrast   species Estimate Std. Error    z Pr(>|z|)    S 2.5 % 97.5 %
# bill_depth_mm mean(dY/dX) Adelie        1.66      0.383 4.34   <0.001 16.1 0.911   2.41
# bill_depth_mm mean(dY/dX) Gentoo        4.75      0.526 9.03   <0.001 62.3 3.716   5.78
# bill_depth_mm mean(dY/dX) Chinstrap     3.64      0.606 6.02   <0.001 29.1 2.457   4.83

Sorry for all the questions. It is really surprising that calculating group-level random effects is so challenging. Anyway, thanks for your amazing package. I am reading the documents these days as I am shifting from emmeans to marginaleffects.

vincentarelbundock commented 1 year ago

Yep, I expect that things should work as you expect for brms models.

Sorry, but I can't really think of a solution for lme4 models. ranef() is really just a special case where the quantities happen to match the output of marginaleffects, but the behind-the-scenes computation is very different. Every single quantity produced by marginaleffects functions relies on predict(). For instance, computing standard errors requires us to compute numerical derivatives of the predict() output.

Maybe there's a bootstrap or simulation-based alternative, but I don't have a good mental model of that right now so can't offer specific advice. Let me know if you find a solution.