Quantile forecasts are identical for all forecast horizons (XGBoost)

dwolffram commented 2 months ago

Hi there,

are there any known issues with forecasting multiple horizons and multiple quantiles with XGBoost? In my use case, I'm forecasting 1-4 weeks ahead, and somehow the forecasts are identical across all horizons.

I saw this in the latest release notes, is this maybe related? Perhaps the quantiles are computed across all horizons?

Fixed a bug in quantile_loss, where the loss was computed on all samples rather than only on the predicted quantiles. #2284

Or is there something wrong with my code? From my understanding, there should be one estimator for each quantile-horizon-combination, so it's very unlikely they are exactly the same, right?

QUANTILES = [0.025, 0.25, 0.5, 0.75, 0.975]

xgb = XGBModel(output_chunk_length=4,
               lags=35,
               lags_past_covariates=27,
               use_static_covariates=True,
               likelihood="quantile",
               quantiles=QUANTILES)

xgb.fit(targets[ : train_end], past_covariates=covariates)

backtest = xgb.historical_forecasts(
    series=targets[ : validation_end],
    past_covariates=covariates,
    start=validation_start,
    forecast_horizon=4,
    stride=1,
    last_points_only=False,
    retrain=False,
    verbose=True,
    num_samples=200
)

Thanks a lot for any input!

dennisbader commented 2 months ago

Hi @dwolffram, a couple of things:

check that your xgboost version >= 2.0.0 (before there were some issues with quantile regression)
XGBRegressor is a tree-based method so it will not be able to predict values that it has not been trained on. See also the example below from the AirPassengersDataset which has a strong upwards trend. The model has been fit on the first 60 steps, and most of the steps afterwards (what we use in historical forecasts) have values larger then the ones observed during training. In the beginning of the historical forecasts, where the values are still within the observerd value range, the predicted quantiles vary over the forecast horizon. Afterwards, where the target values exceed the observed value range, the predicted quantiles become almost constant.
Also, we recommend to use LightGBMModel or CatBoostModel over XGBModel for quantile regression, as they usually perform better (see second and third images).

XGBModel output

Code (xgb trained to predict the next 6 months)

import matplotlib.pyplot as plt

from darts import concatenate
from darts.datasets import AirPassengersDataset
from darts.models import XGBModel

series = AirPassengersDataset().load()
target_end = 60
validation_start = 60
QUANTILES = [0.025, 0.25, 0.5, 0.75, 0.975]

xgb = XGBModel(
    lags=12,
    output_chunk_length=6,
    use_static_covariates=True,
    likelihood="quantile",
    quantiles=QUANTILES
)
xgb.fit(series)

hfc = xgb.historical_forecasts(
    series=series,
    start=validation_start,
    forecast_horizon=6,
    stride=6,
    last_points_only=False,
    retrain=False,
    verbose=True,
    num_samples=200
)
hfc = concatenate(hfc, axis=0)

series.plot()
hfc.plot()
plt.show()

LightGBMModel output

CatBoostModel output

dwolffram commented 2 months ago

Thanks a lot, I was indeed still using an older version of xgboost, and updating it solved the problem! There is no trend in my dataset, so that shouldn't be an issue but thanks for the nice examples.

unit8co / darts

Quantile forecasts are identical for all forecast horizons (XGBoost) #2382