unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.91k stars 858 forks source link

Backtest with custom forecast dates #2497

Open carlocav opened 1 month ago

carlocav commented 1 month ago

Is your feature request related to a current problem? Please describe. My need it to get historical daily forecasts on the first day of each month. Right now the method historical_forecasts only allows to specify an integer stride, since some months have 31 or 28 days it is not possible to use that to link to the first day of the month.

Describe proposed solution Possibility to specify a custom list of forecast dates (better), or a time frequency strides (e.g. month, quarter, year).

Describe potential alternatives Please let me know if there are any alternatives I can use now.

madtoinou commented 1 month ago

Hi @carlocav,

I like the idea of supporting more types for the "stride" parameter however, it will require a lot of changes in the current logic of historical_forecasts().

If time is not a constraint, you could use historical_forecasts() to generate all the forecasts, manipulate them and then pass them to backtest() using its historical_forecasts argument.

If you're performing the backtest on a single series (and the model is trained on this same series or you don't need to retrain the model) and want to speed things up, you could do something like this:

import pandas as pd

from darts import TimeSeries
from darts.models import LinearRegressionModel
from darts.datasets import AirPassengersDataset

# Read data
series = AirPassengersDataset().load()

# Create training and validation sets:
train, val = series.split_after(0.6)

# Fit the model 
model = LinearRegressionModel(lags=3, output_chunk_length=2)
model.fit(train)

# Create a list containing slice of the original series, with a time index ends that matches the period/horizon of interest
# Here, we pretend to be interested in the first 2 months on each year
strided_ends = [
    series.drop_after(pd.Timestamp("1957-03-01")),
    series.drop_after(pd.Timestamp("1958-03-01")),
    series.drop_after(pd.Timestamp("1959-03-01")),
]

hf = model.historical_forecasts(
    strided_ends, # the model will forecast each series in the list
    start=-2, # forecast only the period of interest
    start_format="position", # allow to use "relative index" instead of "index value"
    retrain=False, # could be `True` if you want to retrain the model before each forecast
    last_points_only=False,
    forecast_horizon=2 # horizon of interest, assuming the first 2 months of the year
)

hf[0][-1].time_index
>>> DatetimeIndex(['1957-01-01', '1957-02-01'], dtype='datetime64[ns]', name='Month', freq='MS')

hf[2][0].time_index
>>> DatetimeIndex(['1959-01-01', '1959-02-01'], dtype='datetime64[ns]', name='Month', freq='MS')

# convert to the historical forecasts to the appropriate format
hfc = [f[0] for f in hf]

model.backtest(
    series=series,
    historical_forecasts=hfc
)
>>> 2.6298539728878385

If you model needs to be trained on several series, you will have to generate the historical forecasts one by one before passing them to backtest().

Let me know if it worked for you.

carlocav commented 4 weeks ago

Thank you very much for suggesting this workaround.