Inconsistency with "transform =" in "avg_predictions()"

arielhasidim commented 1 year ago

I'm using the transform = argument in the avg_predictions() function in order to present the treatment effect on the predicted proportions of a binary outcome. The documentation regarding the transform argument states that:

A function applied to unit-level adjusted predictions and confidence intervals just before the function returns results.

That's why I think it's odd that the transform = function(x) x shows different result than without the trandform.

For example:

example <- glm(vs ~ am * (mpg + cyl + disp + hp + drat + wt), 
                data = mtcars, 
                family = binomial())

avg_predictions(example,
                variables = "am"
)

Gives:

am	estimate	p.value	conf.low	conf.high
0	1	0.999908	2.2204E-16	1
1	2.2204E-16	0.9997359	2.2204E-16	1

While:

avg_predictions(example,
                variables = "am",
                transform = function(x) x
)

Gives:	am	estimate	p.value	conf.low	conf.high
0	0.5625	0	0.5624917	0.5625083
1	0.2499351	0.9764413	-16.338353	16.8382229

sessionInfo():

R version 4.1.2 (2021-11-01) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS 13.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggpubr_0.6.0 scales_1.2.1 ggsci_2.9 haven_2.5.1 bigrquery_1.4.1
[6] quickmatch_0.2.1 distances_0.1.9 AER_1.2-10 survival_3.4-0 sandwich_3.0-2
[11] lmtest_0.9-40 zoo_1.8-11 car_3.1-1 carData_3.0-5 pwr_1.3-0
[16] quantreg_5.94 SparseM_1.81 gtsummary_1.7.0 flextable_0.8.3 labelled_2.10.0
[21] readxl_1.4.2 marginaleffects_0.11.2 MatchIt_4.5.0 lubridate_1.9.2 forcats_1.0.0
[26] stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[31] tibble_3.1.8 ggplot2_3.4.1 tidyverse_2.0.0 DBI_1.1.3 pacman_0.5.1

loaded via a namespace (and not attached): [1] colorspace_2.1-0 ggsignif_0.6.4 rsconnect_0.8.29 parameters_0.20.2 base64enc_0.1-3 fs_1.6.1
[7] rstudioapi_0.14 farver_2.1.1 MatrixModels_0.5-1 bit64_4.0.5 fansi_1.0.4 xml2_1.3.3
[13] splines_4.1.2 knitr_1.42 Formula_1.2-4 jsonlite_1.8.4 gt_0.8.0 broom_1.0.3
[19] dbplyr_2.3.2 compiler_4.1.2 httr_1.4.4 backports_1.4.1 assertthat_0.2.1 Matrix_1.5-1
[25] fastmap_1.1.0 gargle_1.2.1 cli_3.6.0 htmltools_0.5.4 prettyunits_1.1.1 tools_4.1.2
[31] gtable_0.3.1 glue_1.6.2 rappdirs_0.3.3 Rcpp_1.0.10 optmatch_0.10.6 cellranger_1.1.0
[37] vctrs_0.5.2 broom.helpers_1.11.0 insight_0.19.1 xfun_0.37 brio_1.1.3 timechange_0.1.1
[43] lifecycle_1.0.3 rstatix_0.7.2 scclust_0.2.3 MASS_7.3-58.1 rlemon_0.2.1 hms_1.1.3
[49] yaml_2.3.7 curl_5.0.0 gdtools_0.2.4 sass_0.4.2 stringi_1.7.12 bayestestR_0.13.0
[55] checkmate_2.1.0 zip_2.2.2 rlang_1.0.6 pkgconfig_2.0.3 systemfonts_1.0.4 commonmark_1.8.1
[61] evaluate_0.20 lattice_0.20-45 labeling_0.4.2 cowplot_1.1.1 bit_4.0.4 tidyselect_1.2.0
[67] magrittr_2.0.3 R6_2.5.1 generics_0.1.3 pillar_1.8.1 withr_2.5.0 datawizard_0.6.5
[73] abind_1.4-5 crayon_1.5.2 uuid_1.1-0 utf8_1.2.3 tzdb_0.3.0 rmarkdown_2.18
[79] officer_0.4.4 progress_1.2.2 grid_4.1.2 data.table_1.14.6 digest_0.6.31 openssl_2.0.4
[85] munsell_0.5.0 askpass_1.1

vincentarelbundock commented 1 year ago

This is likely because of the automatic back transform described on this page: https://vincentarelbundock.github.io/marginaleffects/articles/predictions.html

You can probably specify the type argument explicitly to get identical results.

arielhasidim commented 1 year ago

Thank's @vincentarelbundock!

I did manage to replicate the identical results using the type argument, but I'm afraid there are still some inconsistencies:

transform = NULL argument still gives a different result than transform = function(x) x (it turns out the same as without the transform argument).
According to marginaleffects:::type_dictionary df, glm class should get type = "response" by default, but stating type = "response" argument is how I replicated the result that I got with the transform = function(x) x argument (which differs from the result without type and transform arguments, although this should be the default).
The other type option that is available for glm in this dictionary is "link", which gives a third different result. I'm not able to reproduce the original result with any of the available type options.

Leaving here the third result:

example <- glm(vs ~ am * (mpg + cyl + disp + hp + drat + wt), 
                data = mtcars, 
                family = binomial())

avg_predictions(example,
                variables = "am",
                type= "link"
)

Resulted in:	am	estimate	std.error	statistic	p.value	conf.low	conf.high
0	19.58492	169,918.80	0.00011526	0.999908	-333,015.20	333,054.30
1	-253.43351	765,612.70	-0.000331	0.9997359	##########	##########

vincentarelbundock commented 1 year ago

It’s easier to see the equivalences with a model that converges. See the equivalences below. And please read the paragraph here, paying special attention to the first sentence: https://vincentarelbundock.github.io/marginaleffects/articles/reference/predictions.html#details

library(marginaleffects)

example <- glm(vs ~ am * mpg, 
                data = mtcars, 
                family = binomial())

avg_predictions(example,
                variables = "am",
                type = "response")
# 
#  am Estimate Std. Error     z Pr(>|z|)    S 2.5 % 97.5 %
#   0    0.553     0.0496 11.16   <0.001 93.6 0.456  0.650
#   1    0.296     0.0856  3.45   <0.001 10.8 0.128  0.464
# 
# Columns: am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high

avg_predictions(example,
                variables = "am",
                transform = \(x) x)
# 
#  am Estimate Pr(>|z|)    S 2.5 % 97.5 %
#   0    0.553   <0.001 93.6 0.456  0.650
#   1    0.296   <0.001 10.8 0.128  0.464
# 
# Columns: am, estimate, p.value, s.value, conf.low, conf.high

predictions(example,
            variables = "am",
            type = "link",
            transform = \(x) plogis(x)) |>
    aggregate(estimate ~ am, data = _, FUN = mean)
#   am  estimate
# 1  0 0.5532474
# 2  1 0.2958730

vincentarelbundock / marginaleffects

Inconsistency with "transform =" in "avg_predictions()" #812