tcassou / causal_impact

Python package for causal inference using Bayesian structural time-series models.
232 stars 32 forks source link

Confidence intervals differ from R CausalImpact package #5

Closed sleitner closed 5 years ago

sleitner commented 6 years ago

Hi Thomas, Thanks for working on this! I just wanted to flag that your package doesn't produce results comparable to the R CausalImpact package yet for the same reason that @bange83 identified in https://github.com/jamalsenouci/causalimpact/issues/7

To reproduce:

from causal_impact.causal_impact import CausalImpact
import pandas as pd
import sys
from io import StringIO

DATA = """
t,y,x1,x2\n
2016-02-20 22:41:20,110.0,134.0,128.0\n
2016-02-20 22:41:30,125.0,134.0,128.0\n
2016-02-20 22:41:40,123.0,134.0,128.0\n
2016-02-20 22:41:50,128.0,134.0,128.0\n
2016-02-20 22:42:00,114.0,134.0,128.0\n
2016-02-20 22:42:10,125.0,133.0,128.0\n
2016-02-20 22:42:20,119.0,133.0,128.0\n
2016-02-20 22:42:30,121.0,133.0,128.0\n
2016-02-20 22:42:40,139.0,133.0,128.0\n
2016-02-20 22:42:50,107.0,133.0,128.0\n
2016-02-20 22:43:00,115.0,132.0,128.0\n
2016-02-20 22:43:10,91.0,132.0,128.0\n
2016-02-20 22:43:20,107.0,132.0,128.0\n
2016-02-20 22:43:30,124.0,132.0,128.0\n
2016-02-20 22:43:40,116.0,131.0,128.0\n
2016-02-20 22:43:50,110.0,131.0,128.0\n
2016-02-20 22:44:00,100.0,131.0,128.0\n
2016-02-20 22:44:10,110.0,131.0,128.0\n
2016-02-20 22:44:20,113.0,129.0,128.0\n
2016-02-20 22:44:30,103.0,129.0,128.0\n
2016-02-20 22:44:40,117.0,129.0,128.0\n
2016-02-20 22:44:50,125.0,129.0,128.0\n
2016-02-20 22:45:00,115.0,129.0,128.0\n
2016-02-20 22:45:10,114.0,128.0,128.0\n
2016-02-20 22:45:20,138.0,128.0,128.0\n
2016-02-20 22:45:30,117.0,128.0,128.0\n
2016-02-20 22:45:40,104.0,128.0,128.0\n
2016-02-20 22:45:50,123.0,128.0,128.0\n
2016-02-20 22:46:00,122.0,128.0,128.0\n
2016-02-20 22:46:10,150.0,128.0,128.0\n
2016-02-20 22:46:20,127.0,128.0,128.0\n
2016-02-20 22:46:30,139.0,128.0,128.0\n
2016-02-20 22:46:40,139.0,127.0,127.0\n
2016-02-20 22:46:50,109.0,127.0,127.0\n
2016-02-20 22:47:00,107.0,127.0,127.0\n
2016-02-20 22:47:10,94.0,127.0,127.0\n
2016-02-20 22:47:20,112.0,127.0,127.0\n
2016-02-20 22:47:30,107.0,127.0,127.0\n
2016-02-20 22:47:40,126.0,127.0,127.0\n
2016-02-20 22:47:50,114.0,127.0,127.0\n
2016-02-20 22:48:00,129.0,127.0,127.0\n
2016-02-20 22:48:10,113.0,126.0,127.0\n
2016-02-20 22:48:20,114.0,126.0,127.0\n
2016-02-20 22:48:30,116.0,126.0,127.0\n
2016-02-20 22:48:40,110.0,125.0,126.0\n
2016-02-20 22:48:50,131.0,125.0,126.0\n
2016-02-20 22:49:00,109.0,125.0,126.0\n
2016-02-20 22:49:10,114.0,125.0,127.0\n
2016-02-20 22:49:20,116.0,125.0,126.0\n
2016-02-20 22:49:30,113.0,124.0,125.0\n
2016-02-20 22:49:40,108.0,124.0,125.0\n
2016-02-20 22:49:50,120.0,124.0,125.0\n
2016-02-20 22:50:00,106.0,123.0,125.0\n
2016-02-20 22:50:10,123.0,123.0,125.0\n
2016-02-20 22:50:20,123.0,123.0,124.0\n
2016-02-20 22:50:30,135.0,123.0,124.0\n
2016-02-20 22:50:40,127.0,123.0,124.0\n
2016-02-20 22:50:50,140.0,123.0,123.0\n
2016-02-20 22:51:00,139.0,123.0,123.0\n
2016-02-20 22:51:10,137.0,123.0,123.0\n
2016-02-20 22:51:20,123.0,123.0,123.0\n
2016-02-20 22:51:30,160.0,122.0,123.0\n
2016-02-20 22:51:40,173.0,122.0,123.0\n
2016-02-20 22:51:50,236.0,122.0,123.0\n
2016-02-20 22:52:00,233.0,122.0,123.0\n
2016-02-20 22:52:10,193.0,122.0,123.0\n
2016-02-20 22:52:20,169.0,122.0,123.0\n
2016-02-20 22:52:30,167.0,122.0,123.0\n
2016-02-20 22:52:40,172.0,121.0,123.0\n
2016-02-20 22:52:50,148.0,121.0,123.0\n
2016-02-20 22:53:00,125.0,121.0,123.0\n
2016-02-20 22:53:10,132.0,121.0,123.0\n
2016-02-20 22:53:20,165.0,121.0,123.0\n
2016-02-20 22:53:30,154.0,120.0,123.0\n
2016-02-20 22:53:40,158.0,120.0,123.0\n
2016-02-20 22:53:50,135.0,120.0,123.0\n
2016-02-20 22:54:00,145.0,120.0,123.0\n
2016-02-20 22:54:10,163.0,119.0,122.0\n
2016-02-20 22:54:20,146.0,119.0,122.0\n
2016-02-20 22:54:30,120.0,119.0,121.0\n
2016-02-20 22:54:40,149.0,118.0,121.0\n
2016-02-20 22:54:50,140.0,118.0,121.0\n
2016-02-20 22:55:00,150.0,117.0,121.0\n
2016-02-20 22:55:10,133.0,117.0,120.0\n
2016-02-20 22:55:20,143.0,117.0,120.0\n
2016-02-20 22:55:30,145.0,117.0,120.0\n
2016-02-20 22:55:40,145.0,117.0,120.0\n
2016-02-20 22:55:50,176.0,117.0,120.0\n
2016-02-20 22:56:00,134.0,117.0,120.0\n
2016-02-20 22:56:10,147.0,117.0,120.0\n
2016-02-20 22:56:20,131.0,117.0,120.0"""

df = pd.read_csv(StringIO(DATA))
df["t"] = pd.to_datetime(df["t"])
df.index = df["t"]
del df["t"]

ci = CausalImpact(df, pd.to_datetime('2016-02-20 22:51:20'))
ci.run()
ci.plot()

image

Compare to the R CausalImpact result (see referenced issue for code): image

tcassou commented 6 years ago

Hi @sleitner, thanks for reporting! The confidence is much wider in the python version, I'll try to see if it's a problem of definition (e.g. 99% confidence vs 90%) or a consequence of the estimators themselves. Thanks for sharing the dataset too!

btengels commented 6 years ago

Hi @tcassou, I'm not 100% sure but I think the problem is that you're taking the cumsum of confidence intervals when you should be taking confidence intervals of cumsums.

tcassou commented 6 years ago

Thanks @btengels ! I need to give a closer look at the R package first, I find it intriguing that the confidence interval has a fixed width in R, despite the time series component (for which the confidence should naturally decrease with the horizon of the prediction). Maybe the confidence interval in R is only based on the regression component then? Do you know this already?

As for this package, I'm simply using the conf_int() method from the UnobservedComponent object, which I will inspect as well.

BTW I can see on the plot that the confidence interval is missing right before the intervention date, this is not expected, I give it a look too.

tcassou commented 5 years ago

Hi @sleitner, As @btengels pointed out, I had a silly mistake in the expression of the confidence interval for the cumulative sum of impact, which is fixed by 1eccfb9377a2baefb0f53df02c054f59d6e49c09. I'll release it shortly, thanks again for reporting!

tcassou commented 5 years ago

New plots for the same dataset:

image

tcassou commented 5 years ago

Just released 1.0.4, I'll close this issue for now, please feel free to reopen if you spot anything wrong!

sleitner commented 5 years ago

Thanks for the update @tcassou ! The cumulative impact certainly looks more consistent, but there's something else going on: the uncertainty on the pointwise plot (middle panel) still looks substantially different, and the cumulative plot uncertainty band at the last point looks like 300-800 instead of 500-1,000.

tcassou commented 5 years ago

About the cumulative plot I would need to dig a bit more into the R implementation - I know it's a different solver (MCMC) so to some extent I expect slight differences. If you actually look at the pointwise difference graph, you can see that the signal keeps being higher than the model after intervention in R, while it goes back to a similar level in python. So that the cumulative impact grows higher too, while the interval width is the same. So I'd say this comes either from the model, solver or parameters and not the confidence interval (which is supposed to be symmetrical here given the normally distributed noise).

As I pointed out earlier, I find it surprising that for the pointwise difference as well as the prediction plot, the confidence interval has a fixed width in the R implementation, while there's is a set of time series components in the model (trend and season for example) for which uncertainty should grow bigger as you get further from intervention date.

If you know more about any of this please feel free to share! I'll try to explore more on my end too, thanks again for sharing!