Open habberr opened 1 year ago
It seems, that the reason for this behaviour is that n_validation_samples
param of DeepAR doesn't work the same way as n_samples
param of the DeepAR.predict()
.
DeepAR.predict()
runs autoregressive simulations of decoder_lengths
steps each, n_samples
simluations.n_validation_samples
controls number of samples drawn from the loss distribution for each of the "decoder_lengths" steps, however parameters of the loss distribution are obtained not in the autoregressive fashion, but using 1 step ahead predictions (using true values at the previous step as an input of the current step).@jdb78 could you, please, check if that makes sense?
If my understanding is correct, then as of now validation metrics from pl.Trainer are not very usefull when you are interested in more than 1-step-ahead forecast.
Expected behavior
I am training a DeepAR model and monitor its performance on validation dataset during training in Tensorboard. Once the training finishes, I load saved checkpoint and again evaluate the model on the validation dataset. I expect that I should obtain similar values for metrics in Tensorboard and when evaluating model afterward (obviously they might not be exaclty the same due to sampling, but they shouldn't differ much).
Actual behavior
trainer.callback_metrics
:Seems that the "after training" numbers of quite different from the "during training". Repeatedly sampling "after training" results shows that the difference cannot be due to the random sampling -- values in "after training" are systematically worse than "during training".
Code to reproduce the problem
This problem can be reproduced using this tutorial example: https://pytorch-forecasting.readthedocs.io/en/stable/tutorials/deepar.html
Code from tutorial with minor changes
```python import os import warnings warnings.filterwarnings("ignore") import matplotlib.pyplot as plt import pandas as pd import pytorch_lightning as pl from pytorch_lightning.callbacks import EarlyStopping import torch from pytorch_forecasting import Baseline, DeepAR, TimeSeriesDataSet from pytorch_forecasting.data import NaNLabelEncoder from pytorch_forecasting.data.examples import generate_ar_data from pytorch_forecasting.metrics import SMAPE, MultivariateNormalDistributionLoss # generate data data = generate_ar_data(seasonality=10.0, timesteps=400, n_series=100, seed=42) data["static"] = 2 data["date"] = pd.Timestamp("2020-01-01") + pd.to_timedelta(data.time_idx, "D") data.head() data = data.astype(dict(series=str)) # create dataset and dataloaders max_encoder_length = 60 max_prediction_length = 20 training_cutoff = data["time_idx"].max() - max_prediction_length context_length = max_encoder_length prediction_length = max_prediction_length training = TimeSeriesDataSet( data[lambda x: x.time_idx <= training_cutoff], time_idx="time_idx", target="value", categorical_encoders={"series": NaNLabelEncoder().fit(data.series)}, group_ids=["series"], static_categoricals=[ "series" ], # as we plan to forecast correlations, it is important to use series characteristics (e.g. a series identifier) time_varying_unknown_reals=["value"], max_encoder_length=context_length, max_prediction_length=prediction_length, ) validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff + 1) batch_size = 128 # synchronize samples in each batch over time - only necessary for DeepVAR, not for DeepAR train_dataloader = training.to_dataloader( train=True, batch_size=batch_size, num_workers=0, batch_sampler="synchronized" ) val_dataloader = validation.to_dataloader( train=False, batch_size=batch_size, num_workers=0, batch_sampler="synchronized" ) ####PS I am not sure, whether this is a pytorch-forecasting issue or pytorch-lightning or it's just that I'm doing something wrong:)