vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
466 stars 47 forks source link

Standard errors not the same as other software #849

Closed Generalized closed 1 year ago

Generalized commented 1 year ago

Dear Authors,

With the following data:

org_data_glm <- structure(list(OverallSuccess = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("Success", 
"Failure"), class = "factor"), Arm = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), levels = c("A", 
"B"), class = "factor")), row.names = c(NA, -111L), class = "data.frame")

I first run the prop.test() to calculate the 90% 2-sided CI for the difference in % of successes:

# Let's check the % of successes:

> org_data_glm %>% 
group_by(Arm, OverallSuccess) %>% 
count() %>% 
xtabs(n ~ Arm + OverallSuccess, data=.)
   OverallSuccess
Arm Success Failure
  A       0      56
  B       1      54

# And now let's calculate the 90% CI for the difference in %
# Ignore the warning, it doesn't matter here.

> org_data_glm %>% 
group_by(Arm, OverallSuccess) %>% 
count() %>% 
xtabs(n ~ Arm + OverallSuccess, data=.) %>% 
prop.test(conf.level=0.9, correct=FALSE)

    2-sample test for equality of proportions without continuity correction

data:  .
X-squared = 1.0274, df = 1, p-value = 0.3108
alternative hypothesis: two.sided
90 percent confidence interval:
 -0.04781512  0.01145149
sample estimates:
    prop 1     prop 2 
0.00000000 0.01818182 

Warning message:
In prop.test(., conf.level = 0.9, correct = FALSE) :
  Chi-squared approximation may be incorrect

Let's note the lower bound of the L_CI = -0.04781512

Now, we will reproduce this with the classic 2-sample Wald's "z" procedure:

>  PropCIs::wald2ci(0, 56, 1, 55, conf.level = 0.9, adjust="Wald")
data:  

90 percent confidence interval:
 -0.04781512  0.01145149
sample estimates:
[1] -0.01818182

OK, these methods use the same formulas, and L_CI = -0.04781512

Now, with the average marginal effect over the logistic regression = it's exact equivalent. First, let's try the margins package (reproducing Stata)

mod <- glm(OverallSuccess ~ Arm, family = binomial(link = 'logit'), data=org_data_glm)

>     summary(margins::margins(mod), level = 0.9)
 factor     AME     SE       z      p   lower  upper
   ArmB -0.0182 0.0180 -1.0092 0.3129 -0.0478 0.0115

More precisely: 
>     summary(margins::margins(mod), level = 0.9)$lower
[1] -0.04781512

Good! The d L_CI = -0.04781512

Now with marginaleffects:

>     avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9)

 Term Contrast Estimate Std. Error     z Pr(>|z|)   S   5.0 % 95.0 %
  Arm    B - A  -0.0182      0.018 -1.01    0.312 1.7 -0.0478 0.0114

Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 

Looks almost identical, but:

> avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9)$conf.low
[1] -0.04779004

# Let's check the SE:
>  avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9, df=109)$std.error
[1] 0.01800052

While the SE from the classic non-pooled Wald's z procedure is:

> sqrt( (0 * (1 - 0))/56 + ((1/55) * (1 - (1/55)))/55)
[1] 0.01801577

and this agrees with the margins:

> summary(margins::margins(mod), level = 0.9)$SE
[1] 0.01801577

So the difference comes from the standard error. I guess this comes from the differences in calculating the var-cov via delta method?

I know this difference is microscopic, but I work in controlled environment and I'm obliged to know at least the source of a discrepancy. It was caught by an independent validator and reported to me, so now I need to explain the potential causes.


I use version ‘0.13.0’ from CRAN.

Session info:

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.utf8  LC_CTYPE=Polish_Poland.utf8    LC_MONETARY=Polish_Poland.utf8 LC_NUMERIC=C                  
[5] LC_TIME=Polish_Poland.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] marginaleffects_0.13.0 miceafter_0.5.0        mice_3.16.0            ComplexUpset_1.3.3     patchwork_1.1.2        DescTools_0.99.49     
 [7] geepack_1.3.9          emmeans_1.8.4-1        mmrm_0.2.2             flextable_0.9.2        officer_0.6.2          gghalves_0.1.4        
[13] ggplot2_3.4.2          rlang_1.1.0            tibble_3.2.1           purrr_1.0.1            tidyr_1.2.1            dplyr_1.1.1           
[19] odbc_1.3.4             DBI_1.1.3              settings_0.2.7         knitr_1.41            

loaded via a namespace (and not attached):
  [1] readxl_1.4.1            uuid_1.1-0              backports_1.4.1         Hmisc_4.7-2             systemfonts_1.0.4       plyr_1.8.8             
  [7] repr_1.1.6              splines_4.2.0           TH.data_1.1-1           digest_0.6.31           foreach_1.5.2           htmltools_0.5.4        
 [13] fansi_1.0.3             magrittr_2.0.3          checkmate_2.1.0         cluster_2.1.4           sandwich_3.0-2          askpass_1.1            
 [19] gfonts_0.2.0            jpeg_0.1-10             colorspace_2.0-3        skimr_2.1.5             blob_1.2.3              mitools_2.4            
 [25] textshaping_0.3.6       pan_1.6                 rbibutils_2.2.11        xfun_0.35               crayon_1.5.2            jsonlite_1.8.4         
 [31] Exact_3.2               lme4_1.1-31             survival_3.4-0          zoo_1.8-11              iterators_1.0.14        glue_1.6.2             
 [37] PropCIs_0.3-0           gtable_0.3.1            MatrixModels_0.5-1      car_3.1-1               rms_6.3-0               shape_1.4.6            
 [43] abind_1.4-5             SparseM_1.81            jomo_2.7-4              scales_1.2.1            fontquiver_0.2.1        mvtnorm_1.1-3          
 [49] Rcpp_1.0.10             htmlTable_2.4.1         xtable_1.8-4            foreign_0.8-84          bit_4.0.5               proxy_0.4-27           
 [55] Formula_1.2-4           prediction_0.3.14       fontLiberation_0.1.0    glmnet_4.1-6            htmlwidgets_1.6.2       httr_1.4.4             
 [61] RColorBrewer_1.1-3      ellipsis_0.3.2          pkgconfig_2.0.3         ggmice_0.0.1            deldir_1.0-6            nnet_7.3-18            
 [67] binom_1.1-1.1           utf8_1.2.2              crul_1.3                tidyselect_1.2.0        later_1.3.0             munsell_0.5.0          
 [73] cellranger_1.1.0        tools_4.2.0             cli_3.4.1               generics_0.1.3          margins_0.3.26          broom_1.0.2            
 [79] evaluate_0.20           stringr_1.5.0           fastmap_1.1.0           yaml_2.3.6              ragg_1.2.4              bit64_4.0.5            
 [85] zip_2.2.2               mitml_0.4-3             rootSolve_1.8.2.3       nlme_3.1-161            quantreg_5.94           mime_0.12              
 [91] xml2_1.3.3              compiler_4.2.0          rstudioapi_0.14         png_0.1-8               curl_4.3.3              e1071_1.7-12           
 [97] stringi_1.7.8           gdtools_0.3.3           lattice_0.20-45         Matrix_1.5-3            fontBitstreamVera_0.1.1 nloptr_2.0.3           
[103] vctrs_0.6.1             pillar_1.9.0            lifecycle_1.0.3         Rdpack_2.4              estimability_1.4.1      insight_0.19.3         
[109] data.table_1.14.6       lmom_2.9                httpuv_1.6.7            latticeExtra_0.6-30     R6_2.5.1                promises_1.2.0.1       
[115] gridExtra_2.3           gld_2.6.6               codetools_0.2-18        polspline_1.1.22        boot_1.3-28.1           MASS_7.3-58.1          
[121] openssl_2.0.5           withr_2.5.0             httpcode_0.3.0          multcomp_1.4-22         expm_0.999-6            parallel_4.2.0         
[127] hms_1.1.2               grid_4.2.0              rpart_4.1.19            coda_0.19-4             class_7.3-20            minqa_1.2.5            
[133] rmarkdown_2.19          carData_3.0-5           pROC_1.18.0             numDeriv_2016.8-1.1     shiny_1.7.4             base64enc_0.1-3        
[139] interp_1.1-3   
vincentarelbundock commented 1 year ago

Could you please read this thread and try the development version of marginaleffects from GitHub?

This sounds like a very similar issue:

https://github.com/vincentarelbundock/marginaleffects/issues/840#issuecomment-1634156285

vincentarelbundock commented 1 year ago

The last comment in that thread shows how to use the new argument in the development version: https://github.com/vincentarelbundock/marginaleffects/issues/840#issuecomment-1637165327

Generalized commented 1 year ago

Thank you very much for so quick response! The problem is solved. You nailed it perfectly!

> avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9, numderiv = list("fdcenter", eps = 1e-10))$conf.low
[1] -0.04781518

> avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9, numderiv = list("fdcenter", eps = 1e-10))$std.error
[1] 0.01801581

which is now very close margins and the "classic" procedure!


options(scipen = 99999)

# L_CI
> avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9, numderiv = list("fdcenter", eps = 1e-10))$conf.low - wald2ci(0, 56, 1, 55, conf.level = 0.9, adjust="Wald")$conf.int[1]
[1] -0.00000005986086

making a difference with respect to Wald's: 0.00013%

# SE
> avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9, numderiv = list("fdcenter", eps = 1e-10))$std.error - sqrt( (0 * (1 - 0))/56 + ((1/55) * (1 - (1/55)))/55)
[1] 0.00000003665455

which makes 0.0002% w.r.t. Wald's.

I'm sorry if asking for an obvious thing, but is this stated somewhere in the documentation? If not, I think it may be worth adding it. More people from the pharmaceutical industry may search or this issue.

Once again - huge thank you!
vincentarelbundock commented 1 year ago

Great! Let's leave this issue open until the next release (when CRAN comes back from vacation in a couple weeks).

vincentarelbundock commented 1 year ago

Version 0.14.0 was just submitted to CRAN with the better step size selection for numerical derivatives.