py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.73k stars 706 forks source link

Confidence Intervals for ATE #260

Open bradyneal opened 4 years ago

bradyneal commented 4 years ago

I can estimate CATEs and their confidence intervals easily enough using estimator.effect(X=X, T0=T0, T1=T1) and estimator.effect_interval(X=X, T0=T0, T1=T1). However, what about (unconditional) averages treatment effects (ATEs)? I can estimate the ATE by just taking a mean over the CATEs estimator.effect(X=X, T0=T0, T1=T1).mean(), but I don't believe just taking the mean over the CATE intervals will give me valid confidence intervals for the ATE (feel free to correct me if I'm wrong or if this mean over CATE intervals gives a strictly more conservative interval that what I would get if I were to directly bootstrap ATE intervals). Is there functionality in EconML to get confidence intervals of the ATE?

kbattocchi commented 4 years ago

Most of our estimators support passing None for X (you'll need to use this with both fit and effect), which should give you an estimate of the ATE.

kbattocchi commented 4 years ago

Perhaps @vasilismsr has other thoughts on whether there are more reasonable aggregation techniques given the CATEs, or other ideas, though.

bradyneal commented 4 years ago

Ohh, it's because I'm using econml.metalearners, which don't support the distinction between W and X in their fit methods (#190), huh?

vsyrgkanis commented 4 years ago

Yeap. Best bet for ATE is either: LinearDRLearner with X=None or LinearDMLCateEstimator with X=None. The former might have higher variance but makes fewer assumptions, the latter might be more stable but makes a “no effect heterogeneity” assumption.

vsyrgkanis commented 4 years ago

Also for any estimator, we also offer: est.effect_inference(X).population_summary() Which also gives CIs for the ATE on the population. Though these are based on a “conservative” formula so might by slightly larger than the nominal ones.

Albeit this functionality is yet not supported for “inference=bootstrap”, which will be the case for metalearners. It will very soon be added though as a functionality.

bradyneal commented 4 years ago

Gotcha 👍. Any rough timeline for when either of the following will be supported:

  1. The metalearners' fit() will support the W argument
  2. effect_inference() will not error when inference='bootstrap' is passed to fit()?
kbattocchi commented 4 years ago

We've got an issue filed for 1 (#190), but as far as I'm aware no immediate plans to do it; for 2 there is a PR currently in progress (#236) that should address it and we are planning to complete it and create a new release by the end of next week.

bradyneal commented 4 years ago

Gotcha. Thanks! And thanks for the great Python package.