vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
452 stars 47 forks source link

Standard error differences for avg_slopes(glm) with R margins / Python / Stata #840

Closed NickCH-K closed 1 year ago

NickCH-K commented 1 year ago

Bug reports must include:

  1. Concise description of the bug
  2. Minimal reproducible example using publicly available data and the bare minimum code and libraries needed to reproduce the bug.
  3. sessionInfo() output

Make sure you are running the latest development version of marginaleffects and its dependencies: https://vincentarelbundock.github.io/marginaleffects/#installation

I am getting identical AMEs but different standard errors / z-scores / p-values when using avg_slopes on a glm vs. when I do it in R's margins::margins(), in m.get_margeff() from Python's statsmodels, and margins in Stata. I think they're all supposed to be doing delta method so this seems like a bug or perhaps something else odd going on? My understanding is that the marginaleffects SEs are following Stata.

# Load packages and data
library(tidyverse)
library(marginaleffects); library(margins)
df <- causaldata::restaurant_inspections

m1 <- glm(Weekend ~ Year, data = df,
          family = binomial(link = 'logit'))
# Get the average marginal effect of year
avg_slopes(m1, variables = 'Year')
# z-score -2.32
m1 %>%
  margins(variables = 'Year') %>%
  summary()
# z-score -2.077

Note that using the same data and model I get a z-score of -2.068 in Python and -2.07 in Stata:

import statsmodels.formula.api as smf
from causaldata import restaurant_inspections
df = restaurant_inspections.load_pandas().data

# smf.logit wants the dependent variable to be numeric
df['Weekend'] = 1*df['Weekend']
m1 = smf.logit(formula = "Weekend ~ Year", data = df).fit()
m1.get_margeff().summary()
causaldata restaurant_inspections.dta, use clear download
logit weekend year
margins, dydx(year)

sessionInfo:

R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)  Matrix products: default   locale: [1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    [3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                           [5] LC_TIME=English_United States.utf8      time zone: America/Los_Angeles tzcode source: internal  attached base packages: [1] stats     graphics  grDevices utils     datasets  methods   base       other attached packages:  [1] margins_0.3.26         marginaleffects_0.13.0 modelsummary_1.4.1      [4] lubridate_1.9.2        forcats_1.0.0          stringr_1.5.0           [7] dplyr_1.1.2            purrr_1.0.1            readr_2.1.4            [10] tidyr_1.3.0            tibble_3.2.1           ggplot2_3.4.2          [13] tidyverse_2.0.0         loaded via a namespace (and not attached):   [1] sandwich_3.0-2          rlang_1.1.1             magrittr_2.0.3            [4] multcomp_1.4-24         prediction_0.3.14       compiler_4.3.0            [7] causaldata_0.1.3        systemfonts_1.0.4       vctrs_0.6.2              [10] rvest_1.0.3             httpcode_0.3.0          pkgconfig_2.0.3          [13] crayon_1.5.2            fastmap_1.1.1           backports_1.4.1          [16] ellipsis_0.3.2          effectsize_0.8.3        utf8_1.2.3               [19] promises_1.2.0.1        rmarkdown_2.22          tzdb_0.4.0               [22] ragg_1.2.5              xfun_0.39               cachem_1.0.8             [25] jsonlite_1.8.5          later_1.3.1             uuid_1.1-0               [28] parallel_4.3.0          R6_2.5.1                bslib_0.5.0              [31] tables_0.9.17           stringi_1.7.12          parallelly_1.36.0        [34] jquerylib_0.1.4         lmtest_0.9-40           estimability_1.4.1       [37] Rcpp_1.0.10             assertthat_0.2.1        knitr_1.43               [40] future.apply_1.11.0     zoo_1.8-12              parameters_0.21.1        [43] Matrix_1.5-4            splines_4.3.0           httpuv_1.6.11            [46] timechange_0.2.0        tidyselect_1.2.0        rstudioapi_0.14          [49] codetools_0.2-19        curl_5.0.1              listenv_0.9.0            [52] lattice_0.21-8          shiny_1.7.4             withr_2.5.0              [55] bayestestR_0.13.1       flextable_0.9.1         askpass_1.1              [58] coda_0.19-4             evaluate_0.21           survival_3.5-5           [61] future_1.32.0           huxtable_5.5.2          zip_2.3.0                [64] xml2_1.3.4              pillar_1.9.0            checkmate_2.2.0          [67] DT_0.28                 insight_0.19.2          generics_0.1.3           [70] hms_1.1.3               munsell_0.5.0           scales_1.2.1             [73] globals_0.16.2          xtable_1.8-4            glue_1.6.2               [76] gdtools_0.3.3           emmeans_1.8.6           tools_4.3.0              [79] gfonts_0.2.0            data.table_1.14.8       webshot_0.5.4            [82] mvtnorm_1.2-2           grid_4.3.0              datawizard_0.7.1         [85] colorspace_2.1-0        performance_0.10.4      cli_3.6.1                [88] kableExtra_1.3.4        textshaping_0.3.6       officer_0.6.2            [91] fontBitstreamVera_0.1.1 fansi_1.0.4             viridisLite_0.4.2        [94] svglite_2.1.1           gt_0.9.0                gtable_0.3.3             [97] sass_0.4.6              digest_0.6.31           fontquiver_0.2.1        [100] TH.data_1.1-2           crul_1.4.0              htmlwidgets_1.6.2       [103] htmltools_0.5.5         lifecycle_1.0.3         httr_1.4.6              [106] mime_0.12               fontLiberation_0.1.0    openssl_2.0.6           [109] MASS_7.3-58.4
--
 
vincentarelbundock commented 1 year ago

Weird, I don’t get the same result as you:

library(causaldata)
library(marginaleffects)

packageVersion("marginaleffects")
# [1] '0.13.0.9002'

df <- causaldata::restaurant_inspections
m1 <- glm(Weekend ~ Year, data = df)
avg_slopes(m1, variables = "Year") |> print(digits = 5)
#
#  Term    Estimate Std. Error       z Pr(>|z|)   S      2.5 %      97.5 %
#  Year -0.00018692 8.9275e-05 -2.0938 0.036282 4.8 -0.0003619 -1.1944e-05
#
# Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high

Same as margins:

library(margins)
margins(m1) |> summary()
#  factor     AME     SE       z      p   lower   upper
#    Year -0.0002 0.0001 -2.0938 0.0363 -0.0004 -0.0000
NickCH-K commented 1 year ago

I haven't tried gaussian glm but I did get identical results between marginaleffects and margins with lm. I wonder if it's logit that's causing the issue. Can you try a logit glm like in my example?

vincentarelbundock commented 1 year ago

Ah yes, sorry I looked at this too quickly this morning.

The issue is that results appear very sensitive to the epsilon used in the finite difference for standard errors (dSlope/dCoefficient). I don’t know of a great principled way to choose a default that works well across the board. I would very much appreciate a recommendation if you have one!

As you can see, margins results are quite sensitive to that argument, and I’m not sure there’s a good reason to prefer that package's default necessarily:

library(margins)
library(causaldata)
library(marginaleffects)

df <- causaldata::restaurant_inspections
m1 <- glm(Weekend ~ Year, data = df, family = binomial)

# default
summary(margins(m1))$z
# [1] -2.077301

eps = 10^-(4:12)
z = sapply(eps, \(e) summary(margins(m1, eps = e))$z)
data.frame(eps, z)
#     eps           z
# 1 1e-04 -0.56936327
# 2 1e-05 -3.64483782
# 3 1e-06 -2.16205387
# 4 1e-07 -2.07730103
# 5 1e-08 -2.06921540
# 6 1e-09 -2.06664568
# 7 1e-10 -2.01680807
# 8 1e-11 -4.99548105
# 9 1e-12 -0.09006251

One nice thing about marginaleffects is that you can separately contol the epsilon used to compute the slope itself (dY/dX via the eps argument), and the epsilon used to compute the derivatives for standard errors (d(dY/dX)/dB via options). The results below are slightly different because we only manipulate one eps at a time, but the overall picture with marginaleffects is pretty similar:

z = sapply(eps, \(e) {
  options(marginaleffects_numDeriv = list( method = "simple", method.args = list(eps = e)))
  avg_slopes(m1, variables = "Year")$statistic
})
data.frame(eps, z)
#     eps           z
# 1 1e-04 -0.55100840
# 2 1e-05 -3.69565096
# 3 1e-06 -2.16384327
# 4 1e-07 -2.07724126
# 5 1e-08 -2.06707458
# 6 1e-09 -2.10695591
# 7 1e-10 -3.24735705
# 8 1e-11 -0.34801937
# 9 1e-12 -0.07159974

If we use the (much more expensive) Richardson method for differentiation instead of the simple method, we get:

options(marginaleffects_numDeriv = list(method = "Richardson"))
avg_slopes(m1, variables = "Year")$statistic
# [1] -2.06834

I have not played with the (also arbitrary?) Richardson tuning parameters, so I’m not sure if the results are sensitive. See the numDeriv docs.

I don’t know what statsmodels and Stata use as default for eps and why. Would be very curious to learn about it.

NickCH-K commented 1 year ago

Oh interesting, I wouldn't have thought to look at the eps option! Interesting that it matters so much. Thank you!

vincentarelbundock commented 1 year ago

Cool cool.

Will just reopen so I remember to try out the rules of thumb in this wiki article:

https://en.m.wikipedia.org/wiki/Numerical_differentiation#Practical_considerations_using_floating_point_arithmetic

vincentarelbundock commented 1 year ago

FYI, I think numeric instability is potentially a big problem and I want to make it easier for users to try out different options. First, I improved the default procedure to determine the step size. Then, in the dev version on Github, there is a new argument which allows you to do stuff like:

library(causaldata)
library(marginaleffects)

df <- causaldata::restaurant_inspections
m1 <- glm(Weekend ~ Year, data = df, family = binomial)

avg_slopes(m1, numderiv = "richardson")
# 
#  Term  Estimate Std. Error     z Pr(>|z|)   S     2.5 %    97.5 %
#  Year -0.000187   9.05e-05 -2.07   0.0386 4.7 -0.000365 -9.81e-06
# 
# Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high

avg_slopes(m1, numderiv = list("fdcenter", eps = 1e-10))
# 
#  Term  Estimate Std. Error    z Pr(>|z|)   S    2.5 %   97.5 %
#  Year -0.000187    0.00017 -1.1    0.269 1.9 -0.00052 0.000145
# 
# Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high

Thanks for raising the issue!