[TFT] NaN forecasts for series of length 1 with GroupNormalizer

Antoine-Schwartz commented 1 year ago

PyTorch-Forecasting version: 1.0.0

Expected behavior

I have series of all sizes in my dataset, including several that are smaller than my max_prediction_length. However, the minimum length is 1 (I haven't yet tried the "pure" cold-start experience).

To ensure that TFT is able to provide forecasts regardless of the length of the series, I set min_encoder_length to 0. With these configurations, I should be able to produce forecasts for all series.

Actual behavior

For some reason I don't understand, I have no problem when I don't normalize the target by group, but when I do, the forecasts for all series of length 1 are NaNs ==> no problem for lengths 2 or more!

I've already tested and checked a lot of things and I've dived into the source code to try to understand. It seems that the problem arises at the predict time, I haven't seen anything unusual before, for example in the construction of datasets... Have I misunderstood something? Or is it an side-effect bug in the GroupNormalizer during inference?

Code to reproduce the problem

train_ds = TimeSeriesDataSet(
    data=data,
    time_idx="time_idx",
    target="qty",
    group_ids=["ts_id"],
    min_encoder_length=1,
    max_encoder_length=2*prediction_length,
    min_prediction_length=prediction_length,
    max_prediction_length=prediction_length,
    target_normalizer=GroupNormalizer(groups=["model_id"])
)

pred_ds = TimeSeriesDataSet.from_dataset(train_ds, data, predict=True)

batch_size = 512
train_dataloader = train_ds.to_dataloader(train=True, batch_size=batch_size)
pred_dataloader = pred_ds.to_dataloader(train=False, batch_size=batch_size)

trainer = pl.Trainer(
    max_epochs=5, # tmp for debug
    limit_train_batches=5, # tmp for debug
    gradient_clip_val=100.0,
    accelerator="auto",
    logger=None,
)

estimator = TemporalFusionTransformer.from_dataset(
    train_ds,
    learning_rate=0.001,
    hidden_size=16,
    attention_head_size=1,
    dropout=0.1,
    hidden_continuous_size=8,
    output_size=7,
    loss=QuantileLoss(),
    log_interval=1,
    reduce_on_plateau_patience=5,
)

trainer.fit(
    estimator,
    train_dataloaders=train_dataloader
)

predictions = estimator.predict(pred_dataloader, mode="prediction", return_index=True)

Antoine-Schwartz commented 1 year ago

May be connected directly or indirectly with this issue?

alaameloh commented 1 year ago

Not sure if it helps, but what solved the issue for me was changing the transformation (softplus) : for "some" reason it produces very small predictions (e-17), that gets interpreted as NaNs (float32) , consequentially reducing loss (and therefore) predictions to NaN.

I'm currently using no transformation. still experimenting to understand the exact reasons of this behaviour

youssefmecky96 commented 1 year ago

@alaameloh Any luck figuring out the exact reason ? also where exactly did you change the transformation if you don't mind sharing ?

alaameloh commented 1 year ago

@youssefmecky96 didn't pursue the matter for the time being (priorities) you can set the target transformation using the target_normalizer parameter of TimeSeriesDataSet. you can set it to auto

sktime / pytorch-forecasting