Causalforests and high volume panel data

py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.

https://www.microsoft.com/en-us/research/project/alice/

Other

3.73k stars 706 forks source link

Causalforests and high volume panel data #515

Open tbosda opened 3 years ago

tbosda commented 3 years ago

I am wondering what is the correct approach when using DoWhy/EconML with Causal Forest on Panel data with Fixed time and firm effects or fixed time and industry effects after being sure that these effects exist (Hausman test et al). Can one just use time and firm/industry ID as covariates or would one need to include hot encodings / dummies, which would lead to many .... (3500 firms, 49 industries, 12 years in my case, 32000 "observations"). Please advise!

Andyzr commented 3 years ago

Including hot encodings/dummies is a correct way but may not feasible due to a huge amount of variables. Using IDs as covariates may not be a correct way to handle the fixed effect. I would suggest you to use group-wise demeaning to achieve the same effect of adding dummies. You can view the idea implemented in the PanelOLS package.