Result of the python package is different than the R-package result

tcassou / causal_impact

Python package for causal inference using Bayesian structural time-series models.

236 stars 33 forks source link

Result of the python package is different than the R-package result #10

Closed NimaRou closed 5 years ago

NimaRou commented 5 years ago

Dear tcassou,

Thanks for translating the R-package to this nice python-package. I have run a random dataset consisting of 71 x 2 values in both the python package and R package, but it seems that the outcomes of the predictions in both packages are different. Did you every encountered such a thing previously? And if you have any idea what might be the reason for this behaviour?

Thanks in advance.

python_package R_package

NimaRou commented 5 years ago

Or maybe better to ask: what are the default values in your package for the model options?

liyouzhang commented 5 years ago

Is this package still being maintained?

tcassou commented 5 years ago

Hi @NimaRou, @liyouzhang

Thanks for your messages! Firstly yes this package is still being maintained, I have not brought major improvements lately, but bugs or edge cases are always fixed quickly!

Secondly about the possible differences with the R package, some of that has been discussed in #5. The optimisation method is different, and there are less parameters in this version I believe (number of iterations of the solver, and number of seasons for the seasonal component of the model which defaults to 7 = weekly seasonality). For the last part about default parameters, I'm planning to change the API slightly to make them more transparent. I hope this helps!

tcassou commented 5 years ago

Based on commit 158b7dbd8aa1125cb6912cc1588e948a260a1133 and other minor improvements, I've just releases 1.2.0.

In this version, the number of seasons in the BSTS model is passed directly to the constructor (with default = 7):

ci = CausalImpact(df, date(2018, 10, 28), n_seasons=9)

and the max number of iterations in the max likelihood estimator is passed to the run method (with default = 1000)

ci.run(max_iter=500)

Hopefully this will make the parameters clearer.

As for the differences with R and possibly closing the gap, it will require more work and I'm open to suggestions/contributions via PR.

Closing this issue for now, feel free to reopen if you have more questions!