jdb78 / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.85k stars 609 forks source link

RuntimeError: input.size(-1) must be equal to input_size. Expected 37, got 38 using deepAR model #739

Closed emiled16 closed 2 years ago

emiled16 commented 2 years ago

Expected behavior

I am training a deepAR model but when I run the fit method, I expect the trainer module to do the training without encountering any error

Actual behavior

However, I get the following error

RuntimeError: input.size(-1) must be equal to input_size. Expected 37, got 38

After further investigation, I realize that if x_cont = x['encoder_cont'] where x,y = next(iter(train_dataloader)) ,x_cont.shape[-1] is different from len(deepAR.reals).

I don't understand why this is happening, especially that I am instantiating the deepAR model using .from_dataset() method. This should make sure that the shapes are similar, but it is not the case.

Code to reproduce the problem

training = TimeSeriesDataSet(
    train_data,
    time_idx="time_idx",
    target="Consumption",
    group_ids=["LCLid"],
    min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the validation set)
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    static_categoricals=["LCLid"],
    static_reals=[],
    time_varying_known_categoricals=['precipType', 'sun', 'day_of_week', 'month'],
    variable_groups={},  # group of categorical variables can be treated as one variable
    time_varying_known_reals=["time_idx"] + cont_var,
    time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=["Consumption"],
    # add_relative_time_idx=True,
    add_target_scales=False,
    # add_encoder_length=True,
    allow_missing_timesteps=False,
    predict_mode=False
)

batch_size = 128  # set this between 32 to 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=2)

trainer = pl.Trainer(
    max_epochs=30,
    gpus=1,
    weights_summary="top",
    gradient_clip_val=0.1,
    limit_train_batches=30,  # coment in for training, running valiation every 30 batches
    # fast_dev_run=True,  # comment in to check that networkor dataset has no serious bugs
    callbacks=[lr_logger, early_stop_callback],
    logger=logger,
)

deepAR = DeepAR.from_dataset(
    training,
    cell_type='LSTM',
    hidden_size=10,
    rnn_layers=2,
    learning_rate=0.03,
    dropout=0.1,
    loss=NormalDistributionLoss(),
    log_interval=10,
    reduce_on_plateau_patience=4,
)

res = trainer.fit(
    deepAR,
    train_dataloader,
   val_dataloader
)

Above are some parts of the code, I sed the same method to initialize the validation dataloader. Moreover, here is a link to the colab if it can help: https://colab.research.google.com/drive/1SCHTGg4JpbCBpINgS1LFXPRxVDV0GVoh?usp=sharing

Finally, here's the traceback:

RuntimeError                              Traceback (most recent call last)
<ipython-input-11-77021528cc48> in <module>()
      2     deepAR,
      3     train_dataloader,
----> 4    val_dataloader
      5 )

24 frames
/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloaders, val_dataloaders, datamodule, train_dataloader)
    550         self.checkpoint_connector.resume_start()
    551 
--> 552         self._run(model)
    553 
    554         assert self.state.stopped

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in _run(self, model)
    920 
    921         # dispatch `start_training` or `start_evaluating` or `start_predicting`
--> 922         self._dispatch()
    923 
    924         # plugin will finalized fitting (e.g. ddp_spawn will load trained model)

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in _dispatch(self)
    988             self.accelerator.start_predicting(self)
    989         else:
--> 990             self.accelerator.start_training(self)
    991 
    992     def run_stage(self):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer)
     90 
     91     def start_training(self, trainer: "pl.Trainer") -> None:
---> 92         self.training_type_plugin.start_training(trainer)
     93 
     94     def start_evaluating(self, trainer: "pl.Trainer") -> None:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer)
    159     def start_training(self, trainer: "pl.Trainer") -> None:
    160         # double dispatch to initiate the training loop
--> 161         self._results = trainer.run_stage()
    162 
    163     def start_evaluating(self, trainer: "pl.Trainer") -> None:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in run_stage(self)
    998         if self.predicting:
    999             return self._run_predict()
-> 1000         return self._run_train()
   1001 
   1002     def _pre_training_routine(self):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in _run_train(self)
   1033             self.progress_bar_callback.disable()
   1034 
-> 1035         self._run_sanity_check(self.lightning_module)
   1036 
   1037         # enable train mode

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in _run_sanity_check(self, ref_model)
   1120             # run eval step
   1121             with torch.no_grad():
-> 1122                 self._evaluation_loop.run()
   1123 
   1124             self.on_sanity_check_end()

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py in run(self, *args, **kwargs)
    109             try:
    110                 self.on_advance_start(*args, **kwargs)
--> 111                 self.advance(*args, **kwargs)
    112                 self.on_advance_end()
    113                 self.iteration_count += 1

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py in advance(self, *args, **kwargs)
    109 
    110         dl_outputs = self.epoch_loop.run(
--> 111             dataloader_iter, self.current_dataloader_idx, dl_max_batches, self.num_dataloaders
    112         )
    113 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py in run(self, *args, **kwargs)
    109             try:
    110                 self.on_advance_start(*args, **kwargs)
--> 111                 self.advance(*args, **kwargs)
    112                 self.on_advance_end()
    113                 self.iteration_count += 1

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py in advance(self, dataloader_iter, dataloader_idx, dl_max_batches, num_dataloaders)
    109         # lightning module methods
    110         with self.trainer.profiler.profile("evaluation_step_and_end"):
--> 111             output = self.evaluation_step(batch, batch_idx, dataloader_idx)
    112             output = self.evaluation_step_end(output)
    113 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py in evaluation_step(self, batch, batch_idx, dataloader_idx)
    156             self.trainer.lightning_module._current_fx_name = "validation_step"
    157             with self.trainer.profiler.profile("validation_step"):
--> 158                 output = self.trainer.accelerator.validation_step(step_kwargs)
    159 
    160         return output

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in validation_step(self, step_kwargs)
    209         """
    210         with self.precision_plugin.val_step_context(), self.training_type_plugin.val_step_context():
--> 211             return self.training_type_plugin.validation_step(*step_kwargs.values())
    212 
    213     def test_step(self, step_kwargs: Dict[str, Union[Any, int]]) -> Optional[STEP_OUTPUT]:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in validation_step(self, *args, **kwargs)
    176 
    177     def validation_step(self, *args, **kwargs):
--> 178         return self.model.validation_step(*args, **kwargs)
    179 
    180     def test_step(self, *args, **kwargs):

/usr/local/lib/python3.7/dist-packages/pytorch_forecasting/models/base_model.py in validation_step(self, batch, batch_idx)
    368     def validation_step(self, batch, batch_idx):
    369         x, y = batch
--> 370         log, out = self.step(x, y, batch_idx)
    371         log.update(self.create_log(x, y, out, batch_idx))
    372         return log

/usr/local/lib/python3.7/dist-packages/pytorch_forecasting/models/base_model.py in step(self, x, y, batch_idx, **kwargs)
    490             loss = loss * (1 + monotinicity_loss)
    491         else:
--> 492             out = self(x, **kwargs)
    493 
    494             # calculate loss

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/pytorch_forecasting/models/deepar/__init__.py in forward(self, x, n_samples)
    306         Forward network
    307         """
--> 308         hidden_state = self.encode(x)
    309         # decode
    310         input_vector = self.construct_input_vector(

/usr/local/lib/python3.7/dist-packages/pytorch_forecasting/models/deepar/__init__.py in encode(self, x)
    232         input_vector = self.construct_input_vector(x["encoder_cat"], x["encoder_cont"])
    233         _, hidden_state = self.rnn(
--> 234             input_vector, lengths=encoder_lengths, enforce_sorted=False
    235         )  # second ouput is not needed (hidden state)
    236         return hidden_state

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/pytorch_forecasting/models/nn/rnn.py in forward(self, x, hx, lengths, enforce_sorted)
    109                         x, pack_lengths.cpu(), enforce_sorted=enforce_sorted, batch_first=self.batch_first
    110                     ),
--> 111                     hx=hx,
    112                 )
    113                 # replace hidden cell with initial input if encoder_length is zero to determine correct initial state

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
    675             hx = self.permute_hidden(hx, sorted_indices)
    676 
--> 677         self.check_forward_args(input, hx, batch_sizes)
    678         if batch_sizes is None:
    679             result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py in check_forward_args(self, input, hidden, batch_sizes)
    618                            batch_sizes: Optional[Tensor],
    619                            ):
--> 620         self.check_input(input, batch_sizes)
    621         self.check_hidden_size(hidden[0], self.get_expected_hidden_size(input, batch_sizes),
    622                                'Expected hidden[0] size {}, got {}')

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py in check_input(self, input, batch_sizes)
    205             raise RuntimeError(
    206                 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
--> 207                     self.input_size, input.size(-1)))
    208 
    209     def get_expected_hidden_size(self, input: Tensor, batch_sizes: Optional[Tensor]) -> Tuple[int, int, int]:

RuntimeError: input.size(-1) must be equal to input_size. Expected 37, got 38
Mahima-ai commented 2 years ago

Hey, I am also facing the same issue.

emiled16 commented 2 years ago

I found my error. I had the variable time_idx twice in the the list time_varying_known_reals. You may also have a similar problem. I suggest removing all covariates and adding them back one by one while verifying that you have no bugs. Hope this helps