unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.56k stars 829 forks source link

[BUG] Inconsistent minimum time series length behaviour #2264

Open elbal opened 4 months ago

elbal commented 4 months ago

Describe the bug Given an ARIMA model using fit() or historical_forecasts() with a time series with less than 30 elements results in the error: Train series only contains N elements but ARIMA(p=0) model requires at least 30 entries

This is inconsistent with how AutoARIMA works, AutoARIMA has no check for the minimum time series length and works (without issues?) for time series with less than 30 entries. The behaviour is also inconsistent with statsmodel ARIMA that Darts is wrapping as in statsmodel ARIMA has no minimum time series lenght.

To Reproduce

import darts
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA as ARIMA_sm
from darts.models.forecasting.arima import ARIMA as ARIMA_darts
from darts.models.forecasting.auto_arima import AutoARIMA as AutoARIMA_darts

df = pd.DataFrame({"demand": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]})
ts = darts.TimeSeries.from_dataframe(df)

#### statsmodel ARIMA test
arima_sm = ARIMA_sm(endog=df["demand"])
arima_sm = arima_sm.fit()
arima_sm.predict()

#### Darts AutoARIMA test - fit() - no errors
auto_arima_darts = AutoARIMA_darts(seasonal=False, stationary=False)
auto_arima_darts.fit(ts)

#### Darts AutoARIMA test - historical_forecasts() - no errors
forecast = auto_arima_darts.historical_forecasts(ts)
forecast.pd_dataframe().head(None)

#### DartsARIMA test - fit() - error
arima_darts = ARIMA_darts(p=0, d=1, q=0)
arima_darts = arima_darts.fit(ts)

#### DartsARIMA test - historical_forecasts() - error
arima_darts = ARIMA_darts(p=0, d=1, q=0)
arima_darts = arima_darts.historical_forecasts(ts)

Expected behavior I am not sure if there should be a minimim series lenght as there is non for the wrapper statsmodel ARIMA. If a minimum series lenght sould exist it should be consistent accross the board.

System (please complete the following information):

Additional context None.

madtoinou commented 4 months ago

Hi @elbal,

First of all, thank you for reporting this inconsistency and the detailed code snippet.

I am not exactly sure why the length requirements are as they are in these two classes (old undocumented code, min_train_series_length returns 10 for AutoARIMA and 30 for ARIMA), I am going to investigate. However, since ARIMA and AutoARIMA don't come from the same libraries, the additional logic in AutoARIMA might come with constraints and Darts can't really be responsible for making it consistent. I'll keep you updated.

On another stream, an inconsistency between min_train_series_length and min_train_samples was noticed, and I am already working on it. Fixing both at the same time sounds doable.

EDIT: Went down the blame hole, found this comment. The value of 30 (as well as 10) seems arbitrary, I can remove them and rely on the default of 3 defined in ForecastingModel; user will be responsible for knowing that using such a limited number of samples will result in terrible forecasts.