sktime / sktime

A unified framework for machine learning with time series
https://www.sktime.net
BSD 3-Clause "New" or "Revised" License
7.58k stars 1.29k forks source link

implement probabilistic forecasts for `ThetaForecaster` using direct `statsmodels` interface instead of custom functionality in `sktime` #4485

Open yarnabrina opened 1 year ago

yarnabrina commented 1 year ago

As of now, ThetaForecaster generates probabilistic forecasts by adding white noise to point forecasts in sktime. Given that it inherits from _StatsModelsAdapter and statsmodels does have a method to generate prediction intervals for ThetaModel, it'd make sense to take advantage of that by proving a direct interface instead of maintaining a custom logic.

statsmodels example:

import numpy
import pandas
from statsmodels.tsa.forecasting.theta import ThetaModel

dates = pandas.date_range(start="2024-01-01", end="2024-01-31", freq="1D")
values = numpy.random.random_sample(size=31)

sample_data = pandas.Series(values, index=dates)

model = ThetaModel(sample_data)
results = model.fit()

results.prediction_intervals(steps=7, alpha=0.1)

Reference: https://github.com/sktime/sktime/issues/4447#issuecomment-1509944788

yarnabrina commented 8 months ago

I planned to work on this, but I now notice that ThetaForecaster is inheriting from ExponentialSmoothing in sktime, and is not using ThetaModel is statsmodels. Even though statsmodels does the same in their codebase, it probably makes sense to interface ThetaModel directly in sktime instead of maintaining custom sktime implementation.

@fkiraly Is it okay to replace existing estimator with direct interface to statsmodels.tsa.forecasting.theta.ThetaModel? Or, will you suggest to add a new estimator and keep existing one as it is?

fkiraly commented 8 months ago

@fkiraly Is it okay to replace existing estimator with direct interface to statsmodels.tsa.forecasting.theta.ThetaModel? Or, will you suggest to add a new estimator and keep existing one as it is?

Good question. Of the listed authors, I am the only one left active, and @kejsitake is the only one following sktime.

The question would be how much the user interface and logic would change, e.g., parameters like deseasonalize - this is relevant for stable behaviour downstream.

If not much changes, I would be in favour of removing the inheritance and proceeding as you suggest.

One problem that I want to highlight, currently ThetaForecaster is the only forecaster (to the best of my knowledge) covering an uncommon situation in tests, namely that a concrete estimator inherits from another estimator, with added logic or parameters. This has in the past acted as a "canary" for base class changes that affect only this situation, and there have been a few of them. Perhaps we can instead create a dummy test estimator somewhere?

yarnabrina commented 8 months ago

I think the interface will be changed a good amount:

https://www.statsmodels.org/devel/generated/statsmodels.tsa.forecasting.theta.ThetaModel.html

vs

https://www.sktime.net/en/stable/api_reference/auto_generated/sktime.forecasting.theta.ThetaForecaster.html

deseasonalize can be kept, sp can possibly be addressed by renaming with period, but initial_level is likely to be gone and new arguments will be added. Also, this will allow both additive and multiplicative models (plus auto) compared to existing estimator of multiplicative only.


I'll also note that sktime don't support probabilistic forecasts for ExponentialSmoothing, as it's using statsmodels.tsa.holtwinters.ExponentialSmoothing. Interestingly statsmodels.tsa.statespace.exponential_smoothing.ExponentialSmoothing seems to support confidence intervals. Seems to be worth adding.

fkiraly commented 8 months ago

I'll also note that sktime don't support probabilistic forecasts for ExponentialSmoothing, as it's using statsmodels.tsa.holtwinters.ExponentialSmoothing. Interestingly statsmodels.tsa.statespace.exponential_smoothing.ExponentialSmoothing seems to support confidence intervals. Seems to be worth adding.

New issue?

I think the interface will be changed a good amount:

Ok, then I would be in favour of new estimator, possibly deprecate/rename.