Open erlebach opened 2 years ago
There should be exact error message at the end of the traceback text. Maybe the error message is incomplete.
Please provide more infomations for us to reproduce:
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
Operating environment:
Additional context Add any other context about the problem here.
Thanks for the reply. Here is more detail. I am running on Pop!Os 22.04. Here is the output of "uname -a":
Linux pop-os 5.17.5-76051705-generic #202204271406~1655476786~22.04~62dd706 SMP PREEMPT Fri Jun 17 16 x86_64 x86_64 x86_64 GNU/Linux
with the following versions of torch libraries:
torch 1.10.0+cu113
torch-cluster 1.5.9
torch-geometric 2.0.1
torch-scatter 2.0.9
torch-sparse 0.6.12
torch-spline-conv 1.2.1
I get an error trace (note that I get no error when running `dien.py') with the command:
python run_dien.py
I get the following error trace:
cuda ready...
cuda:0
Train on 4 samples, validate on 0 samples, 2 steps per epoch
0it [00:00, ?it/s]
Traceback (most recent call last):
File "run_dien.py", line 68, in <module>
history = model.fit(x, y, batch_size=2, epochs=10, verbose=1, validation_split=0, shuffle=False)
File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/basemodel.py", line 245, in fit
File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/dien.py", line 92, in forward
File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/dien.py", line 220, in forward
File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 249, in pack_padded_sequence
_VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
For reference, my GPU is NVIDIA GeForce RTX 3080 Ti, with Cuda 11. 4, and driver 470.103.01.
Here is the solution. First, the problem only occurs when running on the GPU. Second, to fix, update torch function .../torch/nn/utils/rnn.py
. Replace lengths
by lengths.cpu()
in the call to _VF._pack_padded_sequence
. I got this information from https://github.com/pytorch/pytorch/issues/43227 .
Yes, adding .cpu()
works for me, too. See https://github.com/shenweichen/DeepCTR-Torch/issues/240 for more details.
After installation of DeepCTR via
python setup.py install
, I get the following error:Any help is appreciated.