py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.73k stars 706 forks source link

How to use EconML when we don't know how to distinguish between X and W? #656

Open itewqq opened 2 years ago

itewqq commented 2 years ago

Hi there,

I have noted the following discussion here: https://github.com/microsoft/EconML/issues/589

What I'm curious about is, in a real scenario, for some FEATURES, I can't know which ones have an effect on T and which ones don't, so how how to use EconML under such situation?

Also, if I set all the FEATURES to X, will it have any bad effect on the final result?

Thanks!

salman-moh commented 2 years ago

Regarding bad effect on final result, if you cant distinguish W and X, which I assume are the sufficient adjustment set and set of all features, then you going to have an incorrect causal estimation. For my understanding how you go from all set of features X -> to specific set of features W is by domain knowledge understanding.

My question would be why do we even specify X if we are specifying W.

kbattocchi commented 2 years ago

W and X should both be things that affect T and Y, the difference is that things in X are allowed to also affect the strength of the relationship between T and Y. In general, if you don't know whether something might affect that relationship or not, it's probably safer to default to including it in X rather than in W. However, the downside of this is that this makes the treatment effect estimation problem harder, so you should expect to get wider confidence intervals on the effect estimate even if it turns out that that particular feature does not affect the strength of the relationship.

salman-moh commented 2 years ago

so then are X and W together all just confounder variables?

kbattocchi commented 2 years ago

Yes, exactly. And for techniques like Double ML, we concatenate X and W together for fitting our first stage models, but then we only featurize and interact X with the T residuals when fitting the second stage model.

justforsoy commented 1 year ago

What if a feature do not direct affect Y and T, but it affect the strength of the relationship between T and Y. Should I add it to X?

kbattocchi commented 1 year ago

@justforsoy If it affects the relationship between T and Y, then I think it must inherently affect Y as a result of this relationship (i.e. most of our estimators assume the structural model Y=theta(X)*T+..., so Y does vary based on X even if X does not appear in the remainder of the expression).

But to directly answer your question, yes, a variable should go into X if it affects the strength of the relationship.