vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
428 stars 45 forks source link

Inconsistency with "transform =" in "avg_predictions()" #812

Closed arielhasidim closed 1 year ago

arielhasidim commented 1 year ago

I'm using the transform = argument in the avg_predictions() function in order to present the treatment effect on the predicted proportions of a binary outcome. The documentation regarding the transform argument states that:

A function applied to unit-level adjusted predictions and confidence intervals just before the function returns results.

That's why I think it's odd that the transform = function(x) x shows different result than without the trandform.

For example:

example <- glm(vs ~ am * (mpg + cyl + disp + hp + drat + wt), 
                data = mtcars, 
                family = binomial())

avg_predictions(example,
                variables = "am"
)

Gives:

am estimate p.value conf.low conf.high
0 1 0.999908 2.2204E-16 1
1 2.2204E-16 0.9997359 2.2204E-16 1

While:

avg_predictions(example,
                variables = "am",
                transform = function(x) x
)
Gives: am estimate p.value conf.low conf.high
0 0.5625 0 0.5624917 0.5625083
1 0.2499351 0.9764413 -16.338353 16.8382229

sessionInfo():

R version 4.1.2 (2021-11-01) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS 13.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggpubr_0.6.0 scales_1.2.1 ggsci_2.9 haven_2.5.1 bigrquery_1.4.1
[6] quickmatch_0.2.1 distances_0.1.9 AER_1.2-10 survival_3.4-0 sandwich_3.0-2
[11] lmtest_0.9-40 zoo_1.8-11 car_3.1-1 carData_3.0-5 pwr_1.3-0
[16] quantreg_5.94 SparseM_1.81 gtsummary_1.7.0 flextable_0.8.3 labelled_2.10.0
[21] readxl_1.4.2 marginaleffects_0.11.2 MatchIt_4.5.0 lubridate_1.9.2 forcats_1.0.0
[26] stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[31] tibble_3.1.8 ggplot2_3.4.1 tidyverse_2.0.0 DBI_1.1.3 pacman_0.5.1

loaded via a namespace (and not attached): [1] colorspace_2.1-0 ggsignif_0.6.4 rsconnect_0.8.29 parameters_0.20.2 base64enc_0.1-3 fs_1.6.1
[7] rstudioapi_0.14 farver_2.1.1 MatrixModels_0.5-1 bit64_4.0.5 fansi_1.0.4 xml2_1.3.3
[13] splines_4.1.2 knitr_1.42 Formula_1.2-4 jsonlite_1.8.4 gt_0.8.0 broom_1.0.3
[19] dbplyr_2.3.2 compiler_4.1.2 httr_1.4.4 backports_1.4.1 assertthat_0.2.1 Matrix_1.5-1
[25] fastmap_1.1.0 gargle_1.2.1 cli_3.6.0 htmltools_0.5.4 prettyunits_1.1.1 tools_4.1.2
[31] gtable_0.3.1 glue_1.6.2 rappdirs_0.3.3 Rcpp_1.0.10 optmatch_0.10.6 cellranger_1.1.0
[37] vctrs_0.5.2 broom.helpers_1.11.0 insight_0.19.1 xfun_0.37 brio_1.1.3 timechange_0.1.1
[43] lifecycle_1.0.3 rstatix_0.7.2 scclust_0.2.3 MASS_7.3-58.1 rlemon_0.2.1 hms_1.1.3
[49] yaml_2.3.7 curl_5.0.0 gdtools_0.2.4 sass_0.4.2 stringi_1.7.12 bayestestR_0.13.0
[55] checkmate_2.1.0 zip_2.2.2 rlang_1.0.6 pkgconfig_2.0.3 systemfonts_1.0.4 commonmark_1.8.1
[61] evaluate_0.20 lattice_0.20-45 labeling_0.4.2 cowplot_1.1.1 bit_4.0.4 tidyselect_1.2.0
[67] magrittr_2.0.3 R6_2.5.1 generics_0.1.3 pillar_1.8.1 withr_2.5.0 datawizard_0.6.5
[73] abind_1.4-5 crayon_1.5.2 uuid_1.1-0 utf8_1.2.3 tzdb_0.3.0 rmarkdown_2.18
[79] officer_0.4.4 progress_1.2.2 grid_4.1.2 data.table_1.14.6 digest_0.6.31 openssl_2.0.4
[85] munsell_0.5.0 askpass_1.1

vincentarelbundock commented 1 year ago

This is likely because of the automatic back transform described on this page: https://vincentarelbundock.github.io/marginaleffects/articles/predictions.html

You can probably specify the type argument explicitly to get identical results.

arielhasidim commented 1 year ago

Thank's @vincentarelbundock!

I did manage to replicate the identical results using the type argument, but I'm afraid there are still some inconsistencies:

  1. transform = NULL argument still gives a different result than transform = function(x) x (it turns out the same as without the transform argument).
  2. According to marginaleffects:::type_dictionary df, glm class should get type = "response" by default, but stating type = "response" argument is how I replicated the result that I got with the transform = function(x) x argument (which differs from the result without type and transform arguments, although this should be the default).
  3. The other type option that is available for glm in this dictionary is "link", which gives a third different result. I'm not able to reproduce the original result with any of the available type options.

Leaving here the third result:

example <- glm(vs ~ am * (mpg + cyl + disp + hp + drat + wt), 
                data = mtcars, 
                family = binomial())

avg_predictions(example,
                variables = "am",
                type= "link"
)
Resulted in: am estimate std.error statistic p.value conf.low conf.high
0 19.58492 169,918.80 0.00011526 0.999908 -333,015.20 333,054.30
1 -253.43351 765,612.70 -0.000331 0.9997359 ########## ##########
vincentarelbundock commented 1 year ago

It’s easier to see the equivalences with a model that converges. See the equivalences below. And please read the paragraph here, paying special attention to the first sentence: https://vincentarelbundock.github.io/marginaleffects/articles/reference/predictions.html#details

library(marginaleffects)

example <- glm(vs ~ am * mpg, 
                data = mtcars, 
                family = binomial())

avg_predictions(example,
                variables = "am",
                type = "response")
# 
#  am Estimate Std. Error     z Pr(>|z|)    S 2.5 % 97.5 %
#   0    0.553     0.0496 11.16   <0.001 93.6 0.456  0.650
#   1    0.296     0.0856  3.45   <0.001 10.8 0.128  0.464
# 
# Columns: am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high

avg_predictions(example,
                variables = "am",
                transform = \(x) x)
# 
#  am Estimate Pr(>|z|)    S 2.5 % 97.5 %
#   0    0.553   <0.001 93.6 0.456  0.650
#   1    0.296   <0.001 10.8 0.128  0.464
# 
# Columns: am, estimate, p.value, s.value, conf.low, conf.high

predictions(example,
            variables = "am",
            type = "link",
            transform = \(x) plogis(x)) |>
    aggregate(estimate ~ am, data = _, FUN = mean)
#   am  estimate
# 1  0 0.5532474
# 2  1 0.2958730