Reproducibility issue in TimeGrad with ver-0.7.0

zalandoresearch / pytorch-ts

PyTorch based Probabilistic Time Series forecasting framework based on GluonTS backend

MIT License

1.21k stars 191 forks source link

Reproducibility issue in TimeGrad with ver-0.7.0 #152

Open nonconvexopt opened 8 months ago

nonconvexopt commented 8 months ago

I installed pytorch-ts by git clone and branched to ver-0.7.0. I debugged all the issues in using timegrad model on electricity dataset, resolving differences in using diffusers instead self-implemented diffusion models. However, the train loss (or validation loss too) does not record around 0.07 reported at the timegrad-electricity example. I get 0.2 mininum even with known hyperparameter setting. Even if I tune the hyperparameters extensively, I get almost similar result (increased the number of training steps (diffusion steps), tuned learing rate, and so on). I assume there are some issues in adapting diffusers library currently.

Can you update the timegrad-electricity example with ver-0.7.0?

nonconvexopt commented 8 months ago

It looks like that the predictions are gathered around 0

nonconvexopt commented 8 months ago

I applied DEISMultistepScheduler following the #145, However, valid loss does no go down under 0.3. @ProRedCat Can you share the details of your setting? Below is my code:

estimator = TimeGradEstimator(
    num_layers = 2,
    hidden_size = 40,
    lr=1e-3,
    weight_decay = 0,
    batch_size=32,
    num_batches_per_epoch=100,
    prediction_length=elec_dataset.metadata.prediction_length,
    context_length=elec_dataset.metadata.prediction_length,
    input_size=370,
    freq=elec_dataset.metadata.freq,
    scaling=True,
    scheduler = DEISMultistepScheduler(
        num_train_timesteps=150,
        beta_end=0.1,
    ),
    trainer_kwargs={
        "max_epochs": 200,
    }
)

ProRedCat commented 8 months ago

Here's my parameters, the loss gets to 0.277 but the paper does not use loss for its evaluation, they use CPRS-Sum and I get CRPS-Sum: 0.018305275885916936 for this model. The code for that should be at the bottom of the example.

scheduler = DEISMultistepScheduler( num_train_timesteps=150, beta_end=0.1, )

estimator = TimeGradEstimator(
    input_size=int(dataset.metadata.feat_static_cat[0].cardinality),
    hidden_size=64,
    num_layers=2,
    dropout_rate=0.1,
    lags_seq=[1],
    scheduler=scheduler,
    num_inference_steps=150,
    prediction_length=dataset.metadata.prediction_length,
    context_length=dataset.metadata.prediction_length,
    freq=dataset.metadata.freq,
    scaling="mean",
    trainer_kwargs=dict(max_epochs=200, accelerator="gpu", devices="1"),
)

ProRedCat commented 8 months ago

For reference here is the graphs from that model

nonconvexopt commented 8 months ago

Your result looks completely fine. Thank you for sharing. I will try to reproduce it.

coding-loong commented 8 months ago

Hi, I would like to ask if you know how to set the right hyperparameters of TimeGrad on the solar and Wikipedia datasets to get results consistent with the paper.

nonconvexopt commented 8 months ago

@ProRedCat If we use DEISMultistepScheduler, does it mean that we are using a little bit advanced version of ScoreGrad: Multivariate Probabilistic Time Series Forecasting with Continuous Energy-based Generative Models?

kashif commented 8 months ago

sorry folks i am traveling this week... I will try to have a look next week