statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
9.98k stars 2.87k forks source link

[BUG] Cannot predict after SARIMAXResults `remove_data` method called #8384

Open jelc53 opened 2 years ago

jelc53 commented 2 years ago

Describe the bug

Cannot predict after SARIMAXResults remove_data method called. As an example, for my project the statsmodel results object is ~2 Gb and remove_data reduced this to ~200 Mb (which I was super excited about!), but then unable to reuse results object to make predictions and so rather useless.

Code Sample, a copy-pastable example if possible

Code example taken from statsmodels sarimax documentation here: https://github.com/statsmodels/statsmodels/issues/new?assignees=&labels=&template=bug_report.md&title=

import numpy as np
import pandas as pd
from scipy.stats import norm
import statsmodels.api as sm
import matplotlib.pyplot as plt
from datetime import datetime
import requests
from io import BytesIO

# Dataset
wpi1 = requests.get('https://www.stata-press.com/data/r12/wpi1.dta').content
data = pd.read_stata(BytesIO(wpi1))
data.index = data.t
# Set the frequency
data.index.freq="QS-OCT"

# Fit the model
mod = sm.tsa.statespace.SARIMAX(data['wpi'], trend='c', order=(1,1,1))
res = mod.fit(disp=False)
print(res.summary())

# Prediction
res.forecast(5)  # this works, but file size is very big

# Remove data & re-run prediction
res.remove_data()  # reduces file size substantially
res.forecast(5)  # does not work! (error msg below)
**Note**: As you can see, there are many issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates. **Note**: Please be sure you are using the latest released version of `statsmodels`, or a recent build of `main`. If your problem has been fixed in an unreleased version, you might be able to use `main` until a new release occurs. **Note**: If you are using a released version, have you verified that the bug exists in the main branch of this repository? It helps the limited resources if we know problems exist in the current main branch so that they do not need to check whether the code sample produces a bug in the next release.

If the issue has not been resolved, please file it in the issue tracker.

Expected Output

1991-01-01    118.358862
1991-04-01    120.340500
1991-07-01    122.167204
1991-10-01    123.858458
1992-01-01    125.431301
Freq: QS-OCT, Name: predicted_mean, dtype: float64

Error message

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/workspace/scripts/../smartshift_load_forecasting/model/arima_fourier_nem/model.py", line 37, in get_forecast_results
    new_forecast = fitted_model.forecast(steps=n_steps, exog=exog)
  File "/virtualenvs/smartshift-load-forecasting-9TtSrW0h-py3.9/lib/python3.9/site-packages/statsmodels/base/wrapper.py", line 113, in wrapper
    obj = data.wrap_output(func(results, *args, **kwargs), how)
  File "/virtualenvs/smartshift-load-forecasting-9TtSrW0h-py3.9/lib/python3.9/site-packages/statsmodels/tsa/statespace/mlemodel.py", line 3442, in forecast
    return self.predict(start=self.nobs, end=end, **kwargs)
  File "/virtualenvs/smartshift-load-forecasting-9TtSrW0h-py3.9/lib/python3.9/site-packages/statsmodels/tsa/statespace/mlemodel.py", line 3403, in predict
    prediction_results = self.get_prediction(start, end, dynamic, **kwargs)
  File "/virtualenvs/smartshift-load-forecasting-9TtSrW0h-py3.9/lib/python3.9/site-packages/statsmodels/tsa/statespace/mlemodel.py", line 3287, in get_prediction
    self.model._get_prediction_index(start, end, index))
  File "/virtualenvs/smartshift-load-forecasting-9TtSrW0h-py3.9/lib/python3.9/site-packages/statsmodels/tsa/base/tsa_model.py", line 833, in _get_prediction_index
    nobs = len(self.endog)
TypeError: object of type 'NoneType' has no len()

Output of import statsmodels.api as sm; sm.show_versions()

[paste the output of ``import statsmodels.api as sm; sm.show_versions()`` here below this line] INSTALLED VERSIONS ------------------ Python: 3.10.4.final.0 OS: Linux 5.14.0-1048-oem #55-Ubuntu SMP Mon Aug 8 14:58:10 UTC 2022 x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 statsmodels =========== Installed: 0.13.0 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/statsmodels) Required Dependencies ===================== cython: 0.29.30 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/Cython) numpy: 1.23.1 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/numpy) scipy: 1.7.3 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/scipy) pandas: 1.4.0 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/pandas) dateutil: 2.8.2 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/dateutil) patsy: 0.5.2 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/patsy) Optional Dependencies ===================== matplotlib: 3.5.2 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/matplotlib) backend: module://matplotlib_inline.backend_inline cvxopt: Not installed joblib: 1.1.0 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/joblib) Developer Tools ================ IPython: 8.4.0 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/IPython) jinja2: Not installed sphinx: Not installed pygments: 2.11.2 (/home/julian/anaconda3/envs/smartshift/lib/python3.10/site-packages/pygments) pytest: Not installed virtualenv: Not installed
ChadFulton commented 2 years ago

Thanks for reporting this. I think we should probably be able to support forecasting after removing data, so this is probably just about what assumptions the forecasting code makes (e.g. in the error message above, it checks the length of endog which doesn't exist after the data is removed).

jelc53 commented 2 years ago

Yep perfect, and I think similar for any exogenous data. Happy to work on a PR but not possible for me this week.