pymc-labs / CausalPy

A Python package for causal inference in quasi-experimental settings
https://causalpy.readthedocs.io
Apache License 2.0
834 stars 53 forks source link

Allow 'extrapolation' in synthetic control #256

Open drbenvincent opened 9 months ago

drbenvincent commented 9 months ago

As highlighted in #255, the current synthetic control model is restricted to interpolation. That is, when the synthetic control is modelled as a weighted sum of untreated units, and that weighting sums to 1, then the model can only interpolate within the region of the control units.

We should consider adding the ability for synthetic control models to extrapolate, and that can be done by letting the sum of weights deviate from 1.

I have not yet thought about whether we should stick with the current WeightedSumFitter class and make it more customisable, or if we should create different classes. Maybe the former.

juanitorduz commented 6 months ago

My understanding is that we actually do not wanna interpolate. See https://matheusfacure.github.io/python-causality-handbook/15-Synthetic-Control.html and the nice image (😆)

image

We could consider adding synthetic idd-in-diff which ads an intercept https://matheusfacure.github.io/python-causality-handbook/25-Synthetic-Diff-in-Diff.html

drbenvincent commented 2 months ago

Though see this paper:

Ben-Michael, E., Feller, A., & Rothstein, J. (2021). The augmented synthetic control method. Journal of the American Statistical Association, 116(536), 1789-1803.

The synthetic control method (SCM) is a popular approach for estimating the impact of a treatment on a single unit in panel data settings. The “synthetic control” is a weighted average of control units that balances the treated unit’s pre-treatment outcomes as closely as possible. A critical feature of the original proposal is to use SCM only when the fit on pre- treatment outcomes is excellent. We propose Augmented SCM as an extension of SCM to settings where such pre-treatment fit is infeasible. Analogous to bias correction for inexact matching, Augmented SCM uses an outcome model to estimate the bias due to imperfect pre- treatment fit and then de-biases the original SCM estimate. Our main proposal, which uses ridge regression as the outcome model, directly controls pre-treatment fit while minimizing extrapolation from the convex hull. This estimator can also be expressed as a solution to a modified synthetic controls problem that allows negative weights on some donor units. We bound the estimation error of this approach under different data generating processes, including a linear factor model, and show how regularization helps to avoid over-fitting to noise. We demonstrate gains from Augmented SCM with extensive simulation studies and apply this framework to estimate the impact of the 2012 Kansas tax cuts on economic growth. We implement the proposed method in the new augsynth R package.

drbenvincent commented 2 months ago

We could consider adding synthetic idd-in-diff which ads an intercept https://matheusfacure.github.io/python-causality-handbook/25-Synthetic-Diff-in-Diff.html

Agreed, we already have an issue for this https://github.com/pymc-labs/CausalPy/issues/47