matheusfacure / python-causality-handbook

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.
https://matheusfacure.github.io/python-causality-handbook/landing-page.html
MIT License
2.61k stars 456 forks source link

Question on Chapter 21 (Meta Learners) - Classifier vs Regressor #305

Closed sreeja-guha closed 1 year ago

sreeja-guha commented 1 year ago

I wanted to understand why are Regressors being used instead of Classifiers, when we are predicting a binary outcome variable? CATE is not binary, but the outcome is. While using the causalML package, I have used XGBClassifier as base learner, with a binary outcome variable. When I use "predict", with the option "return_components = True", I find that the expectations (probabilities in this case) in treatment and control are subtracted to obtain CATE estimate. Wondering why LGBMRegressor is being used in this book, and not LGBMClassifier. It would be fine if we were estimating CATE directly, but we are estimating the conversion probability here. What am I missing? Thank you, the book is very helpful!

maswiebe commented 1 year ago

A classifier is throwing away information from the regressor. If we used a classifier to train M in the S-learner, then the CATE would only have values in {1, 0, -1}.

matheusfacure commented 1 year ago

You could use a classifier and use the predict_proba method. I just used regression because it works ant the code is simpler.

s_learner_cate_train = (s_learner.predict_proba(train[X].assign(**{T: 1}))[:, 1] -
                        s_learner.predict_proba(train[X].assign(**{T: 0}))[:, 1])