rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
341 stars 30 forks source link

emtrends() fails with class "averaging": unable to find the data #413

Closed MarcRieraDominguez closed 1 year ago

MarcRieraDominguez commented 1 year ago

Describe the bug

The emtrends() function in version 1.8.5 does not compute slopes with models of class "averaging". If I understand correctly, it cannot find the dataset, even if it is supplied to emtrends() as a data argument. emmeans() works fine with averaged models.

To reproduce


R version 4.2.0

library(emmeans) # 1.8.5
#> Warning: package 'emmeans' was built under R version 4.2.3
library(MuMIn) # 1.46.0 Dredge, model averaging
library(tidyverse) # 1.3.1 pipes
#> Warning: package 'ggplot2' was built under R version 4.2.2

# Will use mtcars data as toy dataset, using am as binary response (automatic/manual)
dataset <- mtcars
dataset$vs <- factor(dataset$vs)
# str(dataset)

# Linear model: works fine
mod <- glm(am ~ wt + mpg + vs, family = binomial(link = "logit"), data = dataset)
# summary(mod)

emmeans::emtrends(mod, specs = ~ wt, var = "wt", trans = "response", data = dataset)
#> In 'ref_grid()', use 'regrid = ...' rather than 'transform = ...' to avoid this message.
#> In 'ref_grid()', use 'regrid = ...' rather than 'transform = ...' to avoid this message.
#>    wt wt.trend    SE  df asymp.LCL asymp.UCL
#>  3.22   -0.834 0.479 Inf     -1.77     0.105
#> 
#> Results are averaged over the levels of: vs 
#> Confidence level used: 0.95

emmeans::emmeans(mod, specs = pairwise ~ vs, adjust = "Tukey", type = "response", data = dataset)
#> $emmeans
#>  vs   prob     SE  df asymp.LCL asymp.UCL
#>  0  0.6578 0.3233 Inf  0.103245     0.970
#>  1  0.0222 0.0435 Inf  0.000441     0.538
#> 
#> Confidence level used: 0.95 
#> Intervals are back-transformed from the logit scale 
#> 
#> $contrasts
#>  contrast  odds.ratio  SE  df null z.ratio p.value
#>  vs0 / vs1       84.8 241 Inf    1   1.562  0.1182
#> 
#> Tests are performed on the log odds ratio scale

# Model averaging: emtnreds() does not work, but emmeans() works fine
options(na.action = "na.fail") # necessary for dredge()
mod.avg <-
  MuMIn::dredge(mod) %>% 
  subset(delta <= 6) %>% 
  MuMIn::model.avg(revised.var = TRUE,
                   fit = TRUE)
#> Fixed term is "(Intercept)"
# summary(mod.avg)

emmeans::emtrends(mod.avg, specs = ~ mpg, var = "mpg", trans = "response", data = dataset)
#> In 'ref_grid()', use 'regrid = ...' rather than 'transform = ...' to avoid this message.
#> In 'ref_grid()', use 'regrid = ...' rather than 'transform = ...' to avoid this message.
#> Error in model.frame.default(formula, data = data, ...) : 
#>   'data' must be a data.frame, environment, or list
#> Error in ref_grid(object = structure(list(msTable = structure(list(df = c(3, : Perhaps a 'data' or 'params' argument is needed

emmeans::emmeans(mod.avg, specs = pairwise ~ vs, adjust = "Tukey", type = "response", data = dataset)
#> $emmeans
#>  vs   prob     SE df lower.CL upper.CL
#>  0  0.5903 0.3089 28 0.095272    0.952
#>  1  0.0362 0.0738 28 0.000493    0.741
#> 
#> Confidence level used: 0.95 
#> Intervals are back-transformed from the logit scale 
#> 
#> $contrasts
#>  contrast  odds.ratio  SE df null t.ratio p.value
#>  vs0 / vs1       38.4 115 28    1   1.221  0.2322
#> 
#> Tests are performed on the log odds ratio scale

Created on 2023-03-29 with reprex v2.0.2

Expected behavior

I would expect the same output from emtnreds() applied to the linear model: slope with uper and lower confidence interval at the untransformed scale.

Additional context

My aim is to calculate marginal means and slopes from model-averaged binomial GLMs (stats::glm()). The models include 9 continuous predictors and 1 factor, and does not contain interactions. The vignettes explicilty mention models of class averaging (https://cran.r-project.org/web/packages/emmeans/vignettes/models.html#I), but emtnreds() is not mentioned. Apologies if I have missed it!

Thank you for your time!

rvlenth commented 1 year ago

I thought I had this covered in a recent update: emtrends() calls ref_grid() twice, and a coding error caused it to not pass the data correctly the second time. That did indeed fix part of it. However, there was another subtle error than came up in the recover_data.averaging() whereby it (incorrectly) needed the response variable. That's probably TMI, but anyway I have it working:

> emtrends(mod.avg, specs = ~ mpg, var = "mpg", trans = "response", data = dataset)
In 'ref_grid()', use 'regrid = ...' rather than 'transform = ...' to avoid this message.
In 'ref_grid()', use 'regrid = ...' rather than 'transform = ...' to avoid this message.
  mpg mpg.trend     SE df lower.CL upper.CL
 20.1  -0.00414 0.0255 28  -0.0564   0.0481

Results are averaged over the levels of: vs 
Confidence level used: 0.95 

Please note the message that shows twice (due to the two calls to ref_grid()). Instead of trans = "response", you should specify regrid = "response". You had those messages too, and they didn't go away. But also, it's pretty unusual to have the same variable in specs and var. Here's another example to show some clean results with no messages:

> emtrends(mod.avg, specs = ~ vs, var = "mpg", regrid = "response", data = dataset)
 vs mpg.trend      SE df lower.CL upper.CL
 0   -0.00725 0.04443 28  -0.0982   0.0838
 1   -0.00104 0.00687 28  -0.0151   0.0130

Confidence level used: 0.95 

BTW, no big deal, but I would have preferred you use the native pipe operator |> thence requiring me to load fewer packages to reproduce your results.

The updated code will be pushed up soon (there'll be a message on this page when that happens), then you can install from GitHub and it should work.

Thanks for reporting this.

MarcRieraDominguez commented 1 year ago

Super! Thank you for all your work!

rvlenth commented 1 year ago

I think this is resolved, so am closing this issue.