Documentation TFT (past_ vs. future_covariates)

tim-sadler commented 4 months ago

I am struggling with the Air Passenger Example for Temporal Fusion Transformer (https://unit8co.github.io/darts/examples/13-TFT-examples.html) and the API documentation of the TFT documentation in general. (See copy and pasted code below).

Reading it, it is not clear, how past_covariates and future_covariates behave in model.fit() and model.predict().

In the example above, future_covariates is set to covariates_transformed which is a transformed time series that stretches over both training and holdout period. Isn't true, that holdout data should never go into the training?

In the holdout validation in the example, future_covariates is not used at all, even though this would be considered "future" since it is a validation against future holdout data.

Could someone explain the behavior of past_covariates and future_covariates to me in a framework of training vs. in-sample validation (e. g. using torch's val_loss) vs. out-of sample future predictions?

Are there any examples you could refer me to that explain these concepts in detail?

Thank you very much for your help!

How the data is split:

# Read data
series = AirPassengersDataset().load()

# we convert monthly number of passengers to average daily number of passengers per month
series = series / TimeSeries.from_series(series.time_index.days_in_month)
series = series.astype(np.float32)

# Create training and validation sets:
training_cutoff = pd.Timestamp("19571201")
train, val = series.split_after(training_cutoff)

# Normalize the time series (note: we avoid fitting the transformer on the validation set)
transformer = Scaler()
train_transformed = transformer.fit_transform(train)
val_transformed = transformer.transform(val)
series_transformed = transformer.transform(series)

# create year, month and integer index covariate series
covariates = datetime_attribute_timeseries(series, attribute="year", one_hot=False)
covariates = covariates.stack(
    datetime_attribute_timeseries(series, attribute="month", one_hot=False)
)
covariates = covariates.stack(
    TimeSeries.from_times_and_values(
        times=series.time_index,
        values=np.arange(len(series)),
        columns=["linear_increase"],
    )
)
covariates = covariates.astype(np.float32)

# transform covariates (note: we fit the transformer on train split and can then transform the entire covariates series)
scaler_covs = Scaler()
cov_train, cov_val = covariates.split_after(training_cutoff)
scaler_covs.fit(cov_train)
covariates_transformed = scaler_covs.transform(covariates)

Fit: my_model.fit(train_transformed, future_covariates=covariates_transformed, verbose=True)

Holdout validation:

def eval_model(model, n, actual_series, val_series):
    pred_series = model.predict(n=n, num_samples=num_samples)

    # plot actual series
    plt.figure(figsize=figsize)
    actual_series[: pred_series.end_time()].plot(label="actual")

    # plot prediction with quantile ranges
    pred_series.plot(
        low_quantile=lowest_q, high_quantile=highest_q, label=label_q_outer
    )
    pred_series.plot(low_quantile=low_q, high_quantile=high_q, label=label_q_inner)

    plt.title("MAPE: {:.2f}%".format(mape(val_series, pred_series)))
    plt.legend()

eval_model(my_model, 24, series_transformed, val_transformed)

madtoinou commented 4 months ago

Hi @tim-sadler,

You can find an illustration of the difference between in the quickstart notebook.

Darts take care of slicing the series, so even if the covariates extend too far, the model won't have access to it during training. This is why the covariates_transformed series is used directly in the example. You can slice the series if you want to convince yourself of that:

# will work
my_model.fit(train_transformed, future_covariates=covariates_transformed[:train_transformed.end_time()], verbose=True)

# won't work because covariates are too short
my_model.predict(n=1)

#will work because the covariates extend just enough into the future
ext_covariates_transformed = covariates_transformed[:train_transformed.end_time() + my_model.output_chunk_length * train_transformed.freq)
my_model.predict(n=1, future_covariates=ext_covariates_transformed)
# note that if n > output_chunk_length, the covariates will have to extend even further into the future

Since the model is trained on only one series (target and covariates), they are stored in the model and predict() can be called directly without specifying them. It's still possible to pass another series/covariates to the TFTModel as it is a GlobalModel.

Let me know if anything is still unclear.

dennisbader commented 4 months ago

Hi @tim-sadler, I'd recommend reading our user guide on covariates from here and how the data is used for Sequential torch forecasting models from here.

TL;DR:

Darts extracts the relevant time frames for training and prediction from your covariates for you. So you don't have to worry about that (your covariates can span over a much longer time frame than required, Darts extracts the relevant parts)
When you train a model on only a singe TimeSeries, Darts model's remember that time series and covariates. So at prediction time, you don't have to supply the covariates again (this is to have a uniform API with our local models that can only be trained on a single series). If you trained the model on multiple series, you would have to pass the target and covariates series to predict().

Edit: @madtoinou, you were quicker than me :D

unit8co / darts

Documentation TFT (past_ vs. future_covariates) #2462