Bug: Float/Double dtypes mismatch when running predict() with DeepAR on Traffic

eivistr commented 3 years ago

PyTorch-Forecasting version: 0.9
PyTorch version: 1.9.0
Python version: 3.8
Operating System: Windows 10

Expected behavior

I am basically trying to run the Electricity and Traffic experiments relatively similarly to how they are ran in the DeepAR paper. The model and obtaining forecasts works fine on the electricity set, however, on Traffic I get a type error when getting predictions on the test set related to mismatch between Float and Double values. As a note on the Traffic dataset, values are floats between 0.0 and 1.0 which might be an issue at some point with scaling and converted to a Double?

I have tested the Traffic set with exact same configuration with the Temporal Fusion Transformer and otherwise exact same setup without obtaining this error, it is only when using DeepAR on Traffic.

Problem is presistent on both CPU and GPU.

Code to reproduce the problem

TimeSeriesDataSet:

    def _traffic_dataset(self, df):
        dataset = TimeSeriesDataSet(
            df,
            time_idx="hours_from_start",
            target="occupancy",
            group_ids=["group_ids"],
            max_encoder_length=168,
            max_prediction_length=24,
            static_categoricals=["group_ids"],
            time_varying_known_reals=["day_of_week", "hour", "hours_from_start"],
            time_varying_unknown_reals=["occupancy"],
            target_normalizer=TorchNormalizer(method="standard", center=True),
        )
        return dataset

DeepAR model definition:

    model = DeepAR.from_dataset(
        train,
        learning_rate=0.001,
        cell_type='LSTM',
        hidden_size=40,
        rnn_layers=3,
        dropout=0.1,
        loss=NormalDistributionLoss(),
        optimizer='adam'
    )

Traceback:

Traceback (most recent call last):
  File "run_exp.py", line 129, in <module>
    main()
  File "run_exp.py", line 119, in main
    run_experiment(f, args)
  File "run_exp.py", line 79, in run_experiment
    test_result = run_model(loader, config)
  File "/cluster/home/USER/PROJECT5_hpo/models/deepar/model.py", line 40, in run_deepar_model
    test_result = get_forecasts(model, test_dl, config['fast_dev_run'])
  File "/cluster/home/USER/PROJECT5_hpo/models/deepar/model.py", line 47, in get_forecasts
    predictions, inputs = model.predict(dataloader, mode="prediction", return_x=True, fast_dev_run=fast_dev_run)
  File "/cluster/home/USER/.local/lib/python3.8/site-packages/pytorch_forecasting/models/deepar/__init__.py", line 406, in predict
    return super().predict(
  File "/cluster/home/USER/.local/lib/python3.8/site-packages/pytorch_forecasting/models/base_model.py", line 1059, in predict
    out = self(x, **kwargs)  # raw output is dictionary
  File "/cluster/home/USER/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/cluster/home/USER/.local/lib/python3.8/site-packages/pytorch_forecasting/models/deepar/__init__.py", line 322, in forward
    output = self.decode(
  File "/cluster/home/USER/.local/lib/python3.8/site-packages/pytorch_forecasting/models/deepar/__init__.py", line 292, in decode
    output = self.decode_autoregressive(
  File "/cluster/home/USER/.local/lib/python3.8/site-packages/pytorch_forecasting/models/base_model.py", line 1808, in decode_autoregressive
    current_target, current_hidden_state = decode_one(
  File "/cluster/home/USER/.local/lib/python3.8/site-packages/pytorch_forecasting/models/deepar/__init__.py", line 283, in decode_one
    x[:, 0, target_pos] = lagged_targets[-1]
RuntimeError: Index put requires the source and destination dtypes match, got Float for the destination and Double for the source.

eivistr commented 3 years ago

Although a pragmatic and obviously bad solution, changing line 283 in models.deepar. init from

x[:, 0, target_pos] = lagged_targets[-1]

to

x[:, 0, target_pos] = lagged_targets[-1].to(torch.float32)

fixes the problem temporarily. Might want to find the root cause of the problem and why the tensor has dtype Double in the first place.

1Reinier commented 3 years ago

Same issue here on an unrelated dataset. It trains but in post-validation (predict_mode) it errors out as above.

eivistr commented 3 years ago

I have had the error on two other datasets since the first time as well, appears that it is not isolated to the traffic set.

lukemerrick commented 2 years ago

I think I found (and fixed!) the root cause of this issue: adding the small eps value to avoid division-by-zero issues with zero standard deviation ended up tricking numpy into promoting 32bit float to 64bit float, which then causes the TorchTransformer to use a 64-bit float in its calculations (which then makes rescaling a 32bit intput generate a 64bit output and raise the error above).

Here's the fix: https://github.com/jdb78/pytorch-forecasting/pull/795/commits/a010ef9788c6dce4d4cc829ce09bac1d34230639

GeorgeG92 commented 2 years ago

I have the exact same issue on a different dataset, training/validation with DeepAR works fine but predicting yields the error mentioned above about the type mismatch. Are there plans for the fix above (or something similar) being integrated in the master branch?

sktime / pytorch-forecasting

Bug: Float/Double dtypes mismatch when running predict() with DeepAR on Traffic #574

Expected behavior

Code to reproduce the problem