Open nejox opened 7 months ago
I was able to recreate your problem.
I changed return torch.cat(sequences, dim=1)
to return torch.cat(sequences, dim=0)
in pytorch_forecasting/utils.py line 249 and it does not raise the error when val_batch_size=128 in this example. After cat along dim=0 the resulting tensor would have shape (350,6), same as when val_batch_size = 1280. It seems like this is what rnn.pack_sequences() is doing for rnn.packedsequences in line 247 too.
Let me know if this works for you.
Thanks! This solves the error. This seems to be a major bug as it should appear at almost every scenario where you use multiple validation batches, right?
I'm also wondering if overwriting the drop_last
Parameter in the Lightning Module makes sense, but that's something else...
I have not spent a lot of time making predictions using the val_dataloader and I just kept the defaults from the tutorial, maybe this is why it has not been encountered before. I haven't had this issue when using tft.predict
on new/future prediction data (using the format for prediction data from the tutorial), but I have only done that one batch at a time. I will have to look into it more.
Hi @nejox I ran into the same exact error, thanks very much for sharing. I wonder how did you manage to fix it, as I don't see the fix merged to the master branch, and no newer versions has been released
hi @fazaki, I didn't really fix that error in my case. For some tests I applied the patch from pull #1511 manually, but in the end I switched to Darts.
Oh, I see, darts was my backup plan indeed. I tried to install the forked repo by Luke and it worked
pip install git+https://github.com/Luke-Chesley/pytorch-forecasting.git@master
Thanks @nejox
Expected Behavior
I executed the TemporalFusionTransformer tutorial code to forecast demand on the Tutorial Dataset. I expected the model to train without issues and validate across multiple batches.
Actual Behavior
The tutorial's batch size configuration results in only one validation batch, thereby initially masking the error. When the validation DataLoader splits the dataset into multiple batches, with the last batch containing fewer samples than the specified batch size, I encountered a
RuntimeError
related to tensor size mismatch. Attempting to setdrop_last=True
did not resolve the issue because this setting is overridden when the mode is set to "PREDICTING" as seen here in the PyTorch Lightning codebase.It appears to me that in this case, the concatenation dimension may be incorrectly specified here in the PyTorch Forecasting codebase.
Manually forcing
drop_last=True
to stay (or all batches having the same size) led to a mismatch in the dimensions ofpredict()
'soutput
andy
attributes, further indicating the issue likely resides in the specified dimension for concatenation.Code to reproduce the problem
The issue is reproduced in this Colab notebook.
Snippet of it setting the batch size:
leads to