rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
340 stars 30 forks source link

Error with model-averaged objects with poly() in the model formula #449

Closed MarcRieraDominguez closed 8 months ago

MarcRieraDominguez commented 9 months ago

Describe the bug

I am unable to obtain estimated marginal means from model-averaged objects (class averaging, obtained with MuMIn package), when the model-averaged objects contain a quadratic term fitted with poly(x, 2, raw = TRUE). Quadratic terms fitted with I(x^2) return a warning but yield a sensible result. The error with poly() appears to be related to parsing, which is completely outside of my current R knowledge.

Quadratic terms and model averaging are discussed in the documentation, but are not considered at the same time.

To reproduce

## Set options for dredge() and load libraries

options(na.action = "na.fail")

library(emmeans)
#> Warning: package 'emmeans' was built under R version 4.2.3
library(MuMIn)
#> Warning: package 'MuMIn' was built under R version 4.2.3

## Prepare dataset

mtcars$am <- factor(mtcars$am)

## Error with averaging + poly()
quad.poly.mod <- lm(disp ~ poly(mpg, 2, raw = TRUE) + am + gear, mtcars)

# No problem if no averaging is there
emmeans::emmeans(quad.poly.mod, specs = pairwise ~ am, data = mtcars)
#> $emmeans
#>  am emmean   SE df lower.CL upper.CL
#>  0     209 19.1 27      170      248
#>  1     196 27.9 27      138      253
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast  estimate   SE df t.ratio p.value
#>  am0 - am1       13 38.2 27   0.341  0.7357

# Problems arise when calculating emmeans on averaged models
quad.poly.dredge <- dredge(quad.poly.mod)
#> Fixed term is "(Intercept)"
quad.poly.avg <- model.avg(quad.poly.dredge, fit = TRUE)
emmeans::emmeans(quad.poly.avg, specs = pairwise ~ am, data = mtcars)
#> Error in parse(text = expr): <text>:1:25: unexpected numeric constant
#> 1: poly(mpg, 2, raw = TRUE)1
#>                             ^

## For completition: models without averaging, without quadratic terms, and with wuadratic terms with I(x^2)
linear.mod <- lm(disp ~ mpg + am + gear, mtcars)
quad.i.mod <- lm(disp ~ mpg + I(mpg ^ 2) + am + gear, mtcars)

emmeans::emmeans(linear.mod, specs = pairwise ~ am, data = mtcars)
#> $emmeans
#>  am emmean   SE df lower.CL upper.CL
#>  0     226 20.8 28      183      268
#>  1     238 27.8 28      181      295
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast  estimate   SE df t.ratio p.value
#>  am0 - am1      -12 42.5 28  -0.282  0.7802
emmeans::emmeans(quad.i.mod, specs = pairwise ~ am, data = mtcars)
#> $emmeans
#>  am emmean   SE df lower.CL upper.CL
#>  0     209 19.1 27      170      248
#>  1     196 27.9 27      138      253
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast  estimate   SE df t.ratio p.value
#>  am0 - am1       13 38.2 27   0.341  0.7357

## Averaging without quadratic: works fine
linear.dredge <- dredge(linear.mod)
#> Fixed term is "(Intercept)"
linear.avg <- model.avg(linear.dredge, fit = TRUE)
emmeans::emmeans(linear.avg, specs = pairwise ~ am, data = mtcars)
#> $emmeans
#>  am emmean   SE df lower.CL upper.CL
#>  0     232 14.7 28      202      262
#>  1     229 17.6 28      193      265
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast  estimate   SE df t.ratio p.value
#>  am0 - am1     3.41 22.2 28   0.153  0.8794

## Averaging with quadratic declared with I(x^2): works fine
quad.i.dredge <- dredge(quad.i.mod)
#> Fixed term is "(Intercept)"
quad.i.avg <- model.avg(quad.i.dredge, fit = TRUE)
emmeans::emmeans(quad.i.avg, specs = pairwise ~ am, data = mtcars)
#> Warning in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"), : There are unevaluated constants in the response formula
#> Auto-detection of the response transformation may be incorrect
#> $emmeans
#>  am emmean   SE df lower.CL upper.CL
#>  0     208 16.3 27      174      241
#>  1     198 20.2 27      156      239
#> 
#> Results are given on the identity (not the response) scale. 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast  estimate   SE df t.ratio p.value
#>  am0 - am1     9.84 22.8 27   0.431  0.6700
#> 
#> Note: contrasts are still on the identity scale

sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.utf8  LC_CTYPE=Spanish_Spain.utf8   
#> [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C                  
#> [5] LC_TIME=Spanish_Spain.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] MuMIn_1.47.5  emmeans_1.8.8
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.2.0     highr_0.9          tools_4.2.0        digest_0.6.29     
#>  [5] nlme_3.1-157       evaluate_0.15      lifecycle_1.0.3    lattice_0.20-45   
#>  [9] rlang_1.1.1        reprex_2.0.2       Matrix_1.4-1       cli_3.6.1         
#> [13] rstudioapi_0.13    yaml_2.3.5         mvtnorm_1.1-3      xfun_0.40         
#> [17] fastmap_1.1.0      coda_0.19-4        withr_2.5.0        stringr_1.5.0     
#> [21] knitr_1.39         fs_1.5.2           vctrs_0.6.3        stats4_4.2.0      
#> [25] grid_4.2.0         glue_1.6.2         survival_3.3-1     rmarkdown_2.14    
#> [29] multcomp_1.4-19    TH.data_1.1-1      magrittr_2.0.3     codetools_0.2-18  
#> [33] htmltools_0.5.6    splines_4.2.0      MASS_7.3-57        xtable_1.8-4      
#> [37] sandwich_3.0-1     estimability_1.4.1 stringi_1.7.6      zoo_1.8-10

Created on 2023-10-12 with reprex v2.0.2

Created on 2023-10-12 with reprex v2.0.2

Expected behavior

The estimated marginal means, as calculated by emmeans on objects that contain quadratic terms fitted with I(x^2).

Additional context

I came across this when working with model-averaged binomial GLMs (n successes / k failures), containing pairwise interactions (numerical:categorical, numerical:numerical), and polynomial terms fitted with poly(x, 2, raw = TRUE).

rvlenth commented 9 months ago

Hmmmm. Well the issue seems to be the way I try to match the averaged coefficients with columns in the model matrix. In one part of the code, I get the model matrix for each variable instead of each term -- which worked for many models but not this one. I'll see what I can do. These averaged models are really dicey because they often include coefficients that are not in the main model,, such as for the first level of a factor. This could take me a while to puzzle through.

rvlenth commented 9 months ago

I posted an earlier comment that was dumb because it was based on the wrong model. However, I do now get:

> emmeans(quad.poly.avg, specs = pairwise ~ am, data = mtcars)
$emmeans
 am emmean   SE df lower.CL upper.CL
 0     207 15.8 27      174      239
 1     197 19.6 27      157      237

Results are given on the poly (not the response) scale. 
Confidence level used: 0.95 

$contrasts
 contrast  estimate   SE df t.ratio p.value
 am0 - am1       10 22.8 27   0.439  0.6643

Note: contrasts are still on the poly scale 

Warning message:
In (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"),  :
  There are unevaluated constants in the response formula
Auto-detection of the response transformation may be incorrect

The stuff about response transformations is mysterious and annoying... but similar to what you got with manual specs. The estimates differ somewhat from what you have. Not sure why.

rvlenth commented 9 months ago

I think I've got it now; will push up soon. The results for quad.i.avg are different because the coefficients are different -- check it out. And I have figured out what was happening with the warnings about response transformations, and we don't get those any more.

MarcRieraDominguez commented 8 months ago

Hi Russell, I've tested version 1.8.9, I no longer get errors with averaged models with poly() in the formula. I'll be closing this issue. Thank you very much!