pymc-labs / CausalPy

A Python package for causal inference in quasi-experimental settings
https://causalpy.readthedocs.io
Apache License 2.0
880 stars 63 forks source link

Add validation period to synthetic control and interrupted time series #364

Open drbenvincent opened 3 months ago

drbenvincent commented 3 months ago

At the moment the model parameters are estimated on the whole pre-intervention period. What we could do instead is to do parameter estimation up to a certain validation window, and produce synthetic control predictions for the validation period as well as the post-intervention period.

I believe that a motivation for this from a Frequentist perspective is possible overfitting of the training period, but this may be less of a concern when doing Bayesian parameter estimation because there can be a degree of regularisation from the priors. Nevertheless, it could be a useful feature to have.

We might want to automatically compute some goodness-of-fit metrics to the validation period.

Could either have a new notebook which covers this functionality or build it in to the exiting synthetic control notebooks.

Screenshot 2024-06-20 at 19 39 54

Inspired by https://netflixtechblog.com/round-2-a-survey-of-causal-inference-applications-at-netflix-fd78328ee0bb?gi=7d795057528e

The changes would be implemented in cp.pymc_experiments.SyntheticControl so that they are specific to this experiment but not the model. It could either be done by adding a kwarg with information about the start of the validation period (for example) or by creating a new class, something like cp.pymc_experiments.SyntheticControlWithValidation. The key point is that these changes related to the experiment, not the model. For example, right now vanilla synthetic control as a model is implemented by cp.pymc_models.WeightedSumFitter. But we would want to be able to have the new validation functionality even if we swap out vanilla synthetic control for augmented or penalised SC.