Standard errors not the same as other software #849

Dear Authors,

With the following data:

org_data_glm <- structure(list(OverallSuccess = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("Success", 
"Failure"), class = "factor"), Arm = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), levels = c("A", 
"B"), class = "factor")), row.names = c(NA, -111L), class = "data.frame")

I first run the prop.test() to calculate the 90% 2-sided CI for the difference in % of successes:

# Let's check the % of successes:

> org_data_glm %>% 
group_by(Arm, OverallSuccess) %>% 
count() %>% 
xtabs(n ~ Arm + OverallSuccess, data=.)
Arm Success Failure
  A       0      56
  B       1      54

# And now let's calculate the 90% CI for the difference in %
# Ignore the warning, it doesn't matter here.

> org_data_glm %>% 
group_by(Arm, OverallSuccess) %>% 
count() %>% 
xtabs(n ~ Arm + OverallSuccess, data=.) %>% 
prop.test(conf.level=0.9, correct=FALSE)

    2-sample test for equality of proportions without continuity correction

data:  .
X-squared = 1.0274, df = 1, p-value = 0.3108
alternative hypothesis: two.sided
90 percent confidence interval:
 -0.04781512  0.01145149
sample estimates:
    prop 1     prop 2 
0.00000000 0.01818182 

Warning message:
In prop.test(., conf.level = 0.9, correct = FALSE) :
  Chi-squared approximation may be incorrect

Let's note the lower bound of the L_CI = -0.04781512

Now, we will reproduce this with the classic 2-sample Wald's "z" procedure:

>  PropCIs::wald2ci(0, 56, 1, 55, conf.level = 0.9, adjust="Wald")

90 percent confidence interval:
 -0.04781512  0.01145149
sample estimates:
[1] -0.01818182

OK, these methods use the same formulas, and L_CI = -0.04781512

Now, with the average marginal effect over the logistic regression = it's exact equivalent. First, let's try the margins package (reproducing Stata)

mod <- glm(OverallSuccess ~ Arm, family = binomial(link = 'logit'), data=org_data_glm)

>     summary(margins::margins(mod), level = 0.9)
 factor     AME     SE       z      p   lower  upper
   ArmB -0.0182 0.0180 -1.0092 0.3129 -0.0478 0.0115

More precisely: 
>     summary(margins::margins(mod), level = 0.9)$lower
[1] -0.04781512

Good! The d L_CI = -0.04781512

Now with marginaleffects:

>     avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9)

 Term Contrast Estimate Std. Error     z Pr(>|z|)   S   5.0 % 95.0 %
  Arm    B - A  -0.0182      0.018 -1.01    0.312 1.7 -0.0478 0.0114

Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 

Looks almost identical, but:

> avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9)$conf.low
[1] -0.04779004

# Let's check the SE:
>  avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9, df=109)$std.error
[1] 0.01800052

While the SE from the classic non-pooled Wald's z procedure is:

> sqrt( (0 * (1 - 0))/56 + ((1/55) * (1 - (1/55)))/55)
[1] 0.01801577

and this agrees with the margins:

> summary(margins::margins(mod), level = 0.9)$SE
[1] 0.01801577

So the difference comes from the standard error. I guess this comes from the differences in calculating the var-cov via delta method?

I know this difference is microscopic, but I work in controlled environment and I'm obliged to know at least the source of a discrepancy. It was caught by an independent validator and reported to me, so now I need to explain the potential causes.

I use version ‘0.13.0’ from CRAN.

vincentarelbundock commented 1 year ago

Could you please read this thread and try the development version of marginaleffects from GitHub?

This sounds like a very similar issue:

vincentarelbundock commented 1 year ago

The last comment in that thread shows how to use the new argument in the development version:

Generalized commented 1 year ago

Thank you very much for so quick response! The problem is solved. You nailed it perfectly!

> avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9, numderiv = list("fdcenter", eps = 1e-10))$conf.low
[1] -0.04781518

> avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9, numderiv = list("fdcenter", eps = 1e-10))$std.error
[1] 0.01801581

which is now very close margins and the "classic" procedure!

options(scipen = 99999)

# L_CI
> avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9, numderiv = list("fdcenter", eps = 1e-10))$conf.low - wald2ci(0, 56, 1, 55, conf.level = 0.9, adjust="Wald")$[1]
[1] -0.00000005986086

making a difference with respect to Wald's: 0.00013%

# SE
> avg_slopes(mod, newdata = org_data_glm, conf_level = 0.9, numderiv = list("fdcenter", eps = 1e-10))$std.error - sqrt( (0 * (1 - 0))/56 + ((1/55) * (1 - (1/55)))/55)
[1] 0.00000003665455

which makes 0.0002% w.r.t. Wald's.

I'm sorry if asking for an obvious thing, but is this stated somewhere in the documentation? If not, I think it may be worth adding it. More people from the pharmaceutical industry may search or this issue.

Once again - huge thank you!
vincentarelbundock commented 1 year ago

Great! Let's leave this issue open until the next release (when CRAN comes back from vacation in a couple weeks).

vincentarelbundock commented 1 year ago

Version 0.14.0 was just submitted to CRAN with the better step size selection for numerical derivatives.