tcassou / causal_impact

Python package for causal inference using Bayesian structural time-series models.
236 stars 33 forks source link

Exception raised in plot. Row-Labels not set #4

Closed pgrandinetti closed 6 years ago

pgrandinetti commented 6 years ago

I followed your doc and ran into this

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-68-af50b5359315> in <module>()
      3 ci = CausalImpact(causal2, date_interv)
      4 ci.run()
----> 5 ci.plot()

python3.5/site-packages/causal_impact/causal_impact.py in plot(self)
    110         pred = self.fit.get_prediction()
    111         pre_model = pred.predicted_mean
--> 112         pre_lower = pred.conf_int()['lower y'].values
    113         pre_upper = pred.conf_int()['upper y'].values
    114         pre_model[:min_t] = np.nan

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

as far I could understand by looking inside the statsmodels code, the error is due to pred.conf_int() returning an array instead of a data frame, and that happens because pred.row_labels is None.

Reproducible code:

df = pandas.DataFrame(
    {'y': [150, 200, 225, 150, 175],
     'x1': [150, 249, 150, 125, 325],
     'x2': [275, 125, 249, 275, 250]
    }
)
date_interv = 2
ci = CausalImpact(df, date_interv)
ci.run()
ci.plot()
tcassou commented 6 years ago

Hi @pgrandinetti, Thanks for reporting, I will look into it! Thanks for the reproducible code too, it should make it easier!

tcassou commented 6 years ago

Hi @pgrandinetti, is your statsmodels version 0.9.0 by any chance? It looks like they introduced a breaking change from 0.8.0.

I can update the project requirements, which should solve it.

pgrandinetti commented 6 years ago

Yes, for packages version I would suggest to use '==' within the requirement file, if you may :)

Thanks!

tcassou commented 6 years ago

Hi @pgrandinetti, I uploaded a new version (1.0.2, see here) that should do it, feel free to check and close this issue if it's solved. Thanks again!

pgrandinetti commented 6 years ago

Hi @tcassou One your recent commits

https://github.com/tcassou/causal_impact/commit/078a00ec6dc854ab660f4e5cd574e2785cbc5f66

has the problem that the DEFAULT_ARGS has nseasons=7, so when I pass any date_inter that is smaller than 7 (like in the small example posted above) then it raises exception in the _check_args method, because you added a if - then raise statement in that commit, more precisely here

https://github.com/tcassou/causal_impact/commit/078a00ec6dc854ab660f4e5cd574e2785cbc5f66#diff-88a770d8a483b603717481750f53cf75R85

tcassou commented 6 years ago

Hi @pgrandinetti

I added this this check on purpose, as I realized I was not capturing cases of very small datasets like in your example. In this situation, without the check, you would introduce significantly more degrees of freedom (i.e. parameters) in the model than what can be estimated from data, so that the result would be completely random, and useless. Without this seasonality parameter (n_seasons=None), the model still has more parameters than data points in this case, so it's not even satisfying this way.

One way to prevent throwing errors could be to set the default of n_seasons to None, but I want to keep this check.

pgrandinetti commented 6 years ago

Hi @tcassous

Yes it makes sense what you said. Probably mentioning it in the docs is enough. I am closing this, thank you