vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
470 stars 47 forks source link

Sampling variation via "unconditional" standard errors #1240

Open grantmcdermott opened 1 month ago

grantmcdermott commented 1 month ago

Possible follow-up to #327.

Hey Vincent,

Stata's margins command has an interesting vce(unconditional) option. The essential idea is to account for sampling variation when calculating standard errors from a sub-population of the data. To wit:

vce(unconditional) specifies that the covariates that are not fixed be treated in a way that accounts for their having been sampled. The VCE is estimated using the linearization method. This method allows for heteroskedasticity or other violations of distributional assumptions and allows for correlation among the observations in the same manner as vce(robust) and vce(cluster ...), which may have been specified with the estimation command. This method also accounts for complex survey designs if the data are svyset.[...] (p. 7)

AFAIK marginaleffects doesn't support this kind of sampling variation adjustment at present. The full mathematical rationale is laid out on p. 55 here, but effectively you need the model scores in addition to the point predictions. I know that some R model classes already provide scores as part of the default return object (mostly obviously: fixest). But I suppose that you could calculate them manually for the canonical base models easily enough with model.matrix(object) * residual(object).

Feel free to put this on the back burner or mark as won't support. But I did want to flag it, since it's come up in the context of my etwfe package. Cheers.

vincentarelbundock commented 1 month ago

This is interesting. Thanks for the link.

I'd be interested in supporting this (and figuring it out), but realistically won't get to it in the near future. Let's leave this open in case I unexpectedly find some time or someone shows up with good ideas.

grantmcdermott commented 1 month ago

Thanks, SGTM.

ngreifer commented 1 month ago

I've been thinking about this a lot, too. With high treatment effect modification, the uncertainty in the average comparison is understated by treating the sample distribution as fixed. I'm not able to parse the theory presented in the Stata documentation.

snhansen commented 1 week ago

A colleague and I wrote something about this recently (https://link.springer.com/article/10.1007/s00184-024-00962-4). Perhaps the theory as presented in that paper is easier to understand than Stata's documention (which I agree is hard to digest). We also note that marginaleffects isn't accounting for sample variation in the covariates around equation (18) in the paper. Let me know if I can be of any help regarding this.

Edit: Just played a bit with this, and I have examples where avg_predictions() yields larger and smaller standard errors compared to the approach in the paper (which coincides with vce(unconditional) in Stata). So it's not a given that the standard error is underestimated it seems.