Closed NickCH-K closed 1 year ago
Weird, I don’t get the same result as you:
library(causaldata)
library(marginaleffects)
packageVersion("marginaleffects")
# [1] '0.13.0.9002'
df <- causaldata::restaurant_inspections
m1 <- glm(Weekend ~ Year, data = df)
avg_slopes(m1, variables = "Year") |> print(digits = 5)
#
# Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
# Year -0.00018692 8.9275e-05 -2.0938 0.036282 4.8 -0.0003619 -1.1944e-05
#
# Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Same as margins
:
library(margins)
margins(m1) |> summary()
# factor AME SE z p lower upper
# Year -0.0002 0.0001 -2.0938 0.0363 -0.0004 -0.0000
I haven't tried gaussian glm but I did get identical results between marginaleffects and margins with lm. I wonder if it's logit that's causing the issue. Can you try a logit glm like in my example?
Ah yes, sorry I looked at this too quickly this morning.
The issue is that results appear very sensitive to the epsilon used in the finite difference for standard errors (dSlope/dCoefficient). I don’t know of a great principled way to choose a default that works well across the board. I would very much appreciate a recommendation if you have one!
As you can see, margins
results are quite sensitive to that argument, and I’m not sure there’s a good reason to prefer that package's default necessarily:
library(margins)
library(causaldata)
library(marginaleffects)
df <- causaldata::restaurant_inspections
m1 <- glm(Weekend ~ Year, data = df, family = binomial)
# default
summary(margins(m1))$z
# [1] -2.077301
eps = 10^-(4:12)
z = sapply(eps, \(e) summary(margins(m1, eps = e))$z)
data.frame(eps, z)
# eps z
# 1 1e-04 -0.56936327
# 2 1e-05 -3.64483782
# 3 1e-06 -2.16205387
# 4 1e-07 -2.07730103
# 5 1e-08 -2.06921540
# 6 1e-09 -2.06664568
# 7 1e-10 -2.01680807
# 8 1e-11 -4.99548105
# 9 1e-12 -0.09006251
One nice thing about marginaleffects
is that you can separately contol the epsilon used to compute the slope itself (dY/dX via the eps
argument), and the epsilon used to compute the derivatives for standard errors (d(dY/dX)/dB via options
). The results below are slightly different because we only manipulate one eps
at a time, but the overall picture with marginaleffects
is pretty similar:
z = sapply(eps, \(e) {
options(marginaleffects_numDeriv = list( method = "simple", method.args = list(eps = e)))
avg_slopes(m1, variables = "Year")$statistic
})
data.frame(eps, z)
# eps z
# 1 1e-04 -0.55100840
# 2 1e-05 -3.69565096
# 3 1e-06 -2.16384327
# 4 1e-07 -2.07724126
# 5 1e-08 -2.06707458
# 6 1e-09 -2.10695591
# 7 1e-10 -3.24735705
# 8 1e-11 -0.34801937
# 9 1e-12 -0.07159974
If we use the (much more expensive) Richardson method for differentiation instead of the simple method, we get:
options(marginaleffects_numDeriv = list(method = "Richardson"))
avg_slopes(m1, variables = "Year")$statistic
# [1] -2.06834
I have not played with the (also arbitrary?) Richardson tuning parameters, so I’m not sure if the results are sensitive. See the numDeriv
docs.
I don’t know what statsmodels
and Stata
use as default for eps
and why. Would be very curious to learn about it.
Oh interesting, I wouldn't have thought to look at the eps option! Interesting that it matters so much. Thank you!
Cool cool.
Will just reopen so I remember to try out the rules of thumb in this wiki article:
FYI, I think numeric instability is potentially a big problem and I want to make it easier for users to try out different options. First, I improved the default procedure to determine the step size. Then, in the dev version on Github, there is a new argument which allows you to do stuff like:
library(causaldata)
library(marginaleffects)
df <- causaldata::restaurant_inspections
m1 <- glm(Weekend ~ Year, data = df, family = binomial)
avg_slopes(m1, numderiv = "richardson")
#
# Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
# Year -0.000187 9.05e-05 -2.07 0.0386 4.7 -0.000365 -9.81e-06
#
# Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
avg_slopes(m1, numderiv = list("fdcenter", eps = 1e-10))
#
# Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
# Year -0.000187 0.00017 -1.1 0.269 1.9 -0.00052 0.000145
#
# Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Thanks for raising the issue!
Bug reports must include:
mtcars
dataset which is distributed by default withR
, or one of the CSV files from the RDatasets archive.sessionInfo()
outputMake sure you are running the latest development version of
marginaleffects
and its dependencies: https://vincentarelbundock.github.io/marginaleffects/#installationI am getting identical AMEs but different standard errors / z-scores / p-values when using
avg_slopes
on a glm vs. when I do it in R'smargins::margins()
, inm.get_margeff()
from Python's statsmodels, and margins in Stata. I think they're all supposed to be doing delta method so this seems like a bug or perhaps something else odd going on? My understanding is that the marginaleffects SEs are following Stata.Note that using the same data and model I get a z-score of -2.068 in Python and -2.07 in Stata:
sessionInfo: