jdb78 / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.77k stars 600 forks source link

Nan in validation predictions if max_prediction_length > 1 #388

Closed ncolella-mghpcc closed 3 years ago

ncolella-mghpcc commented 3 years ago

I have a single data series (i.e. only 1 'label') that I would like to train with validation.

training_cutoff = int(data["time_idx"].max()*0.8)
training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff]
    ...
)

validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff + 1)
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0)

if max_prediction_length > 1, then the last (max_prediction_length - 1) * 2) predictions contain nan.

e.g. if max_prediction_length = 4

baseline_predictions = Baseline().predict(val_dataloader)
baseline_predictions[-10:]

Output:
tensor([[630.2400, 630.2400, 630.2400, 630.2400],
        [630.2400, 630.2400, 630.2400, 630.2400],
        [630.2400, 630.2400, 630.2400, 630.2400],
        [630.2400, 630.2400, 630.2400, 630.2400],
        [629.0642, 629.0642, 629.0642,      nan],
        [630.2900, 630.2900,      nan,      nan],
        [630.6300,      nan,      nan,      nan],
        [277.4278,      nan,      nan,      nan],
        [277.4278, 277.4278,      nan,      nan],
        [277.4278, 277.4278, 277.4278,      nan]])

How should one do multi-length predictions on a single series with validation?


Similarly, as a more direct modification of the tutorial:

max_prediction_length = 6
max_encoder_length = 10
training_cutoff = int(data["time_idx"].max()*0.8)
training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    target="volume",
    group_ids=["agency", "sku"],
    min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the validation set)
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    static_categoricals=["agency", "sku"],
    static_reals=["avg_population_2017", "avg_yearly_household_income_2017"],
    time_varying_known_categoricals=["special_days", "month"],
    variable_groups={"special_days": special_days},  # group of categorical variables can be treated as one variable
    time_varying_known_reals=["time_idx", "price_regular", "discount_in_percent"],
    time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=[
        "volume",
        "log_volume",
        "industry_volume",
        "soda_volume",
        "avg_max_temp",
        "avg_volume_by_agency",
        "avg_volume_by_sku",
    ],
    target_normalizer=GroupNormalizer(
        groups=["agency", "sku"], transformation="softplus"
    ),  # use softplus and normalize by group
    add_relative_time_idx=True,
    add_target_scales=True,
    add_encoder_length=True,
)

# create validation set
validation = TimeSeriesDataSet.from_dataset(training, data[lambda x: x.time_idx > training_cutoff])

# create dataloaders for model
batch_size = 128  # set this between 32 to 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size * 10, num_workers=0)

# calculate baseline mean absolute error, i.e. predict next value as the last available value from the history
actuals = torch.cat([y for x, (y, weight) in iter(val_dataloader)])
baseline_predictions = Baseline().predict(val_dataloader)
baseline_predictions[-20:]

Output:

tensor([[1161.5100, 1161.5100, 1161.5100, 1161.5100, 1161.5100,       nan],
        [1161.5100, 1161.5100, 1161.5100, 1161.5100, 1161.5100, 1161.5100],
        [2274.0090,       nan,       nan,       nan,       nan,       nan],
        [2274.0090, 2274.0090,       nan,       nan,       nan,       nan],
        [2274.0090, 2274.0090, 2274.0090,       nan,       nan,       nan],
        [2274.0090, 2274.0090, 2274.0090, 2274.0090,       nan,       nan],
        [2274.0090, 2274.0090, 2274.0090, 2274.0090, 2274.0090,       nan],
        [2274.0090, 2274.0090, 2274.0090, 2274.0090, 2274.0090, 2274.0090],
        [  65.0475,       nan,       nan,       nan,       nan,       nan],
        [  65.0475,   65.0475,       nan,       nan,       nan,       nan],
        [  65.0475,   65.0475,   65.0475,       nan,       nan,       nan],
        [  65.0475,   65.0475,   65.0475,   65.0475,       nan,       nan],
        [  65.0475,   65.0475,   65.0475,   65.0475,   65.0475,       nan],
        [  65.0475,   65.0475,   65.0475,   65.0475,   65.0475,   65.0475],
        [   0.0000,       nan,       nan,       nan,       nan,       nan],
        [   0.0000,    0.0000,       nan,       nan,       nan,       nan],
        [   0.0000,    0.0000,    0.0000,       nan,       nan,       nan],
        [   0.0000,    0.0000,    0.0000,    0.0000,       nan,       nan],
        [   0.0000,    0.0000,    0.0000,    0.0000,    0.0000,       nan],
        [   0.0000,    0.0000,    0.0000,    0.0000,    0.0000,    0.0000]])
jdb78 commented 3 years ago

There are NaNs because there are different decoder lengths. You can use a min_prediction_length=max_prediction_length to ensure that the decoder length is always the same. This way, there will be no NaNs.