unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
8.1k stars 882 forks source link

[Question] Help understanding warning message when trying to iteratively forecast #2434

Closed christian-dalton closed 3 months ago

christian-dalton commented 4 months ago

I am currently using the Darts library for time series forecasting with the XGBModel and have encountered a warning message that I need some clarification on. I am using a dataset with daily frequency and the following columns: ds (date), cases (predictor), var1 (past cov), var2 (past cov).

As I have been trying to use your library, I received the following warning message when I change output_chunk_length from 7 to 1:

'predict' was called with 'n > output_chunk_length': using auto regression to forecast the values after 'output_chunk_length' points. The model will access '(n-output_chunk_length)' future values of your 'past covariates' (relative to the first predicted time step).

I understand that this warning is related to the auto-regressive nature of the forecasting when n is greater than output_chunk_length. However, I am trying to achieve a setup where the model retrains on each iteration with an output chunk length of 1, allowing me to evaluate the performance iteratively.

Could you please provide guidance on how to properly configure the model to avoid this warning while ensuring it retrains on each iteration with the desired output_chunk_length? Additionally, any recommendations on best practices for this type of iterative forecasting would be greatly appreciated.

Below is a snippet of my code:

import pandas as pd
import numpy as np
from darts import TimeSeries
from darts.models import XGBModel
from darts.metrics import mape, rmse

# Assuming df is my dataframe imported from SQL with columns ds, cases, var1, var2
series = TimeSeries.from_dataframe(df, fill_missing_values=True, freq='D', time_col='ds', fillna_value=0.0001)

# Define forecast period and other parameters
forecast = 7
traininglen = forecast * 2
lag = 7

results = []

for i in range(len(series) - traininglen - forecast):
    date = pd.to_datetime(series.time_index[i + traininglen])

    # Define training and test data
    traindata = series[:i + traininglen]
    testdata = series[i + traininglen:i + traininglen + forecast]

    past_cov = series[['cases', 'var1', 'var2']][:i + traininglen]
    future_cov = series[['cases', 'var1', 'var2']][i + traininglen:i + traininglen + forecast]

    model = XGBModel(
        lags_past_covariates=lag,
        output_chunk_length= 1 #had it 7 previously
    )

    model.fit(traindata, past_covariates=past_cov)
    pred = model.predict(n=forecast, past_covariates=past_cov)

    # Evaluate the predictions
    mapepred = mape(testdata, pred)
    rmsepred = rmse(testdata, pred)

    # Store Results
    results.append({‘Date’: date, ‘MAPE’: mapepred, ‘RMSE’: rmsepred
                                  , ‘Prediction’: pred.values(), ‘Actual’: test.data.values()})

results_df = pd.DataFrame(results)
dennisbader commented 4 months ago

Hi @christian-dalton, it sounds to me like you're trying to perform a historical forecast / backtest.

An example for this using a local model is shown in our quickstart. Also for global models (such as XGBModel, regression models in general, and neural network based models) here. You can leave output_chunk_length=7 and perform historical forecasts that iteratively re-trains / predicts / evaluates (or use a pre-trained model and predict directly) on your historical input series. After each iteration, it moves ahead stride points and perform the steps again.