py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.76k stars 713 forks source link

Beginner question: getting CATE estimations & confidence intervalls #579

Open darjooling86 opened 2 years ago

darjooling86 commented 2 years ago

Hi,

based on the attached graph, I would like to estimate the CATE for the variable 'komplex' (discret) on 'dlz_implementierung' (continuous) based on the treatment 'spm' (binary). From the data generation process I know, that 'komplex' does not affect the assignment of 'spm' (in terms if it is 1 or 0). But 'komplex' is used to determine the effect of the treatment (if 'komplex' > 5 then -10 else 0). Therefore, I would like the retrieve the treatment effect for the different levels 'komplex' together with its confidence intervals. By using the econml estimation methods (e.g. T-Learner, CausalForestDML) with X=['komplex', 'team'], I get the effect and confidence interval on this exact level. Since I have to control for 'team', is there any way in this setup to get the treatment effect and confidence intervals on the 'komplex'-level from the estimator? Any help is appreciated!

Thank you very much!

it_graph_true

kbattocchi commented 2 years ago

Just to make completely sure I understand the question, you state that you

would like to estimate the CATE for the variable 'komplex' (discret) on 'dlz_implementierung' (continuous) based on the treatment 'spm' (binary)

I assume this means that you want to assess the effect of the treatment spm on the outcome dlz_implementierung conditional on the feature komplex - is that right? If so, then I think this actually should work fine with CausalForestDML - just use only komplex in your features X, put team in your controls W, spm as your treatment T, and dlz_implementierung as your outcome Y. In the first stage, we'll fit models for T and Y based on X and W (in the case of the treatment model, given a sufficiently rich model we should be able to learn that the treatment depends on W but not directly on X, while both directly affect Y). Then in the second stage, we'll learn the treatment effect conditional on komplex as desired.