matheusfacure / python-causality-handbook

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.
https://matheusfacure.github.io/python-causality-handbook/landing-page.html
MIT License
2.61k stars 456 forks source link

Appendix on Conformal Inference with Synthetic Controls #344

Open Allgoerithm opened 1 year ago

Allgoerithm commented 1 year ago

In issue https://github.com/matheusfacure/python-causality-handbook/issues/270 by @david26694, the question comes up whether it's correct to fit the counterfactual on the whole data instead of using only the training period.

I see that this issue is closed, and there is an argument in the main text of the book that "The idea here is that the model must be estimated with the entire data, under the postulated null hypothesis, to avoid huge post intervention residuals.", which was probably added as a reaction to the abovementioned issue.

Still, using the whole data strikes me as wrong, because we later use a test statistic that only focuses on the post-intervention period. However, if the data in the post-intervention period fits badly, this will not only lead to large residuals in the post-intervention period, but it will also lead to enlarged residuals in the pre-intervention period because of the model trying to accomodate the post-intervention data. Therefore I think we can't fit the model to the whole data and then only use the post-intervention period for the test statistic. So we should either fit the model only on the pre-intervention data, or use all rediduals for the test statistic. I'm not sure which of the two is the better option.

Allgoerithm commented 1 year ago

...and, by the way: Thanks for this great chapter! It goes a really long way in making these great cutting-edge methods more accessible.