Open chefPony opened 3 years ago
Hi, I independently came around the same conclusions. It would be very useful to improve the tutorial by proposing a different validation method than just "over the last sample" as imposed by predict = True. Most people would like to validate over several sequence in a given lookback window:
Cutoff_Date = data['Datetime'].max() - pd.to_timedelta('30D')
data_train = data[data['Datetime'] < Cutoff_Date]
data_val = data[data['Datetime'] >= Cutoff_Date]
batch_size = 128
validation = TimeSeriesDataSet.from_dataset(training, data, predict=True, stop_randomization = True)
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers = 0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers = 0)
Also unclear for most users whether stop_randomization
should be set to True or False depending on the context.
I ran into this as well. Still not sure whether the increase in batches per epoch due to setting predict=False is a bug or expected behavior. I'm also not sure whether stop_randomization should be set to True or False
I also would like to have a clarification on why a validation set > _max_predictionlength is not implemented/advised exemplified ...
@jdb78 @josesydor @Emungai @polal2is @chefPony
validating over the last sample of each group, sometimes make model overfit to last sample.. So I tried below to validate longer sequence. FYI
validation = TimeSeriesDataSet.from_dataset(training, data,min_encoder_length=max_encoder_length, max_encoder_length=max_encoder_length, predict=False, stop_randomization=True, min_prediction_idx=training_cutoff + 1)
What I want to achieve
Hi everybody, I am trying to fit a temporal fusion transformer model on a training set and, after every x training batches, perform a validation epoch on a separate validation set. The validation epoch should evaluate the model iterating over the whole validation set and not only the last time series samples (which, if I understand correctly, is what happens when the
predict=True
is set on a TimeseriesDataset).Expected behavior
I have tried different experiments to achieve the above in pytorch forecasting but still without success. In the tft tutorial the approach is the following:
However this is not what I want to accomplish, since it will validate on only the last sequences of the training data. My guess, was that to do what I want I should do something like:
And this should have worked as usual:
by running a validation epoch on the val_dataset
Actual behavior
Unexpectedly setting
predict=False
in the validation dataset somehow makes the number of batches on each training epoch grow and by a lot. Is this expected?Code to reproduce the problem