py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.66k stars 692 forks source link

Exploring multi-treatment (discrete) cases #642

Open Lekunze opened 2 years ago

Lekunze commented 2 years ago

Hey, I'm getting started with CATE models in econml and I have been exploring examples for discrete treatments with p > 2. Is there in-built support for these, and how can I formulate such an example? For now I have explored passing treatment values as a list (eg. [1, 0, 0..]) to estimate effects on individual treatments, and other examples passing the entire treatment set. But estimating heterogenous effect at least in the _cate_estimator.py works for binary cases, so multi-treatment examples will require re-formulation as multi-binary cases. Is this observation correct?

kbattocchi commented 2 years ago

Generally our estimators support only a single discrete treatment (although many estimators do support an arbitrary cardinality for that that single treatment, not just binary). So if you have multiple discrete treatments you will need to manually translate them into a format that we support, either

  1. Encoding all of them as a single discrete treatment of cardinality dt_1 x dt_2 x ..., in which case each distinct combination is treated as a unique treatment for which we compute the effect
  2. One-hot-encoding each treatment (dropping one of the columns to avoid colinearity), concatenating them, and treating them as if they were continuous. This should enable you to compute the (additive) effect of each of the different treatments separately.