google / CausalImpact

An R package for causal inference in time series
Apache License 2.0
1.7k stars 253 forks source link

Python version #14

Open rpanai opened 7 years ago

rpanai commented 7 years ago

Are you planning and/or interested to have a python version?

alhauser commented 7 years ago

There are at least two Python ports:

https://github.com/jamalsenouci/causalimpact https://github.com/tcassou/causal_impact

They were developed completely independently from this original R version of CausalImpact, so there's no "official" Python version; we haven't compared the ports to the R version so far (would be interesting to do, though).

rpanai commented 7 years ago

Hi, I'm aware of these two ports but I was wondering if anything is going to come from you.

alhauser commented 7 years ago

No, we don't have plans to provide an own Python port.

WillianFuks commented 5 years ago

If anyone finds this open thread, I've just ported this library for Python using TensorFlow Probability, you can check it here:

tfcausalimpact

All features have been ported, the code is 100% tested, results have been compared to R's and the project is fully documented.

Hope it's useful :)

rpanai commented 5 years ago

@WillianFuks it looks promising. The problems with the others two ports rely on statsmodels 0.8.0 which is kind of a pain. Are you planning to build a conda package of it? If not I can do as I did for jamalsenuci port https://anaconda.org/teamcore/causalimpact

WillianFuks commented 5 years ago

@rpanai good idea :) If you could create the package that would be really cool :+1:

Yeah, I agree about statsmodels version 0.8. Newer version 0.9 was a great improvement and in fact the new features made it possible to compute all the inferences and p-value in this new package (also the team is awesome, they helped me a lot to complete this project).

WillianFuks commented 3 years ago

@WillianFuks it looks promising. The problems with the others two ports rely on statsmodels 0.8.0 which is kind of a pain. Are you planning to build a conda package of it? If not I can do as I did for jamalsenuci port https://anaconda.org/teamcore/causalimpact

I've just published a new package. This time results are equivalent to the original package and does not rely on statsmodels anymore (now it uses TensorFlow Probability).

rpanai commented 3 years ago

Nice. Do you still need a conda package?

WillianFuks commented 3 years ago

That would be awesome!

rpanai commented 3 years ago

@WillianFuks is the new package https://github.com/WillianFuks/tfcausalimpact?

WillianFuks commented 3 years ago

Yeap!

Cherishzhang commented 3 years ago

@WillianFuks we did some A/A test experiments to measure the quality of control set. We just found python package performs much better than R package when selecting more items as control set. wondering is it expected or not?

WillianFuks commented 3 years ago

Hi @Cherishzhang ,

For Python we have pycausalimpact and tfcausalimpact, I'm supposing you are using the latter (the former doesn't use Bayesian inference and therefore results are expected to be different).

It's hard to say, I'm not sure if you should observe much different results. Both packages (R and Python) are built using similar principles and formulas with some implementation differences. One of those differences, for instance, is that the linear regression for tfcausalimpact is based upon a horseshoe prior whereas R's uses spike-and-slab.

Both are equivalent but maybe the way horseshoe models sparsity for the linear regression was more helpful to the dataset you are working with.

It could just as well be the case that on another dataset R might outperform as well. But in general I'd expect both packages being in accordance with conclusions.

In a nutshell, maybe the assumptions in the Python package were helpful to the dataset you're working with but for the most part I'd expect that both will give similar results.

aazz7777 commented 5 months ago

I compare R and python both model.

I'm wonderi


image

if you look at the bottom left, it is R, second python(VI), python(hmc) and prob respectively.

How can we tell which one is performed better and which one I should use for the recommendation. I got some discrepancy of the output for my data.

WillianFuks commented 5 months ago

This is quite weird indeed. If you plot the forecasted data do both packages plot same line? In your data, is there an expectancy if results are bigger or lower than the counterfactual?

aazz7777 commented 5 months ago

yes this is the plot from both model R image One is Python image

No expectancy with this data.