py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.71k stars 700 forks source link

Categorical but non-binary treatment #755

Open vyokky opened 1 year ago

vyokky commented 1 year ago

I got a scenario that have categorical but non-binary treatment (can up to five option). Does DML and its variances, or metalearner support such scenario? It seems DML assumes partial treatment effect which does not work for multi-class treatment.

kbattocchi commented 1 year ago

Our DML instances do support (single) non-binary categorical treatments. Your treatment column should have the raw treatment indicators (e.g. this could be something like [1, 0, 2, 2, 0, 1, 2] or ['a', 'b', 'c', 'c', 'a', 'a'] for a treatment with 3 distinct levels) when calling fit, and likewise when calling effect.

UmaVijh commented 1 year ago

How do Interpret the CATE with such a categorical treatment? My treatment can take three values, [1,2,3]. And my outcome variable is also categorical [0,1] Using a SingleTreeCateInterpreter I get two Cate means and two CATE stds. How do I interpret these? I get ATT estimates for T=0, T=1 and T=2 on training data with each having two rows for the point estimate, stderr, zstat etc in the summary. How does one interpret the individual row effects?

kbattocchi commented 1 year ago

When there are multiple discrete treatments, we drop the first and the marginal effects should be interpreted as the effect from going from T0 to T1, from T0 to T2, etc. (and any other marginal effect can be computed by a linear combination of these, so the effect of moving from T1 to T2 is (T0 to T2) - (T0 to T1), which is what is computed by the effect method when those treatments are passed as arguments). This also applies to the ATT estimates - you're getting the doubly-robust estimate of the marginal effects of moving from treatment 0 to 1 and from 0 to 2 on the population that actually received treatment T=0, T=1, or T=2 in those cases.

Note that for categorical outcomes you should generally not use a classifier but a regressor as your outcome model, see recent discussions at #775 and #779.

ghost commented 8 months ago

for my case I got a scenario that has categorical but non-binary treatment (can up to 11 options), and the same thing for outcome I have more than 20 categories. Does DML and its variances, or metalearner support such a scenario?