shenweichen / DeepCTR-Torch

【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.
https://deepctr-torch.readthedocs.io/en/latest/index.html
Apache License 2.0
2.95k stars 696 forks source link

Error when running run_dien.py #249

Open erlebach opened 2 years ago

erlebach commented 2 years ago

After installation of DeepCTR via python setup.py install, I get the following error:

cuda ready...
cuda:0
Train on 4 samples, validate on 0 samples, 2 steps per epoch
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "run_dien.py", line 68, in <module>
    history = model.fit(x, y, batch_size=2, epochs=10, verbose=1, validation_split=0, shuffle=False)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/basemodel.py", line 245, in fit
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/dien.py", line 92, in forward
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/dien.py", line 220, in forward
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 249, in pack_padded_sequence
    _VF._pack_padded_sequence(input, lengths, batch_first)

Any help is appreciated.

zanshuxun commented 2 years ago
  1. There should be exact error message at the end of the traceback text. Maybe the error message is incomplete.

  2. Please provide more infomations for us to reproduce:

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Operating environment:

Additional context Add any other context about the problem here.

erlebach commented 2 years ago

Thanks for the reply. Here is more detail. I am running on Pop!Os 22.04. Here is the output of "uname -a":

Linux pop-os 5.17.5-76051705-generic #202204271406~1655476786~22.04~62dd706 SMP PREEMPT Fri Jun 17 16 x86_64 x86_64 x86_64 GNU/Linux

with the following versions of torch libraries:

torch                        1.10.0+cu113
torch-cluster                1.5.9
torch-geometric              2.0.1
torch-scatter                2.0.9
torch-sparse                 0.6.12
torch-spline-conv            1.2.1

I get an error trace (note that I get no error when running `dien.py') with the command:

python run_dien.py

I get the following error trace:

cuda ready...
cuda:0
Train on 4 samples, validate on 0 samples, 2 steps per epoch
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "run_dien.py", line 68, in <module>
    history = model.fit(x, y, batch_size=2, epochs=10, verbose=1, validation_split=0, shuffle=False)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/basemodel.py", line 245, in fit
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/dien.py", line 92, in forward
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/dien.py", line 220, in forward
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 249, in pack_padded_sequence
    _VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

For reference, my GPU is NVIDIA GeForce RTX 3080 Ti, with Cuda 11. 4, and driver 470.103.01.

erlebach commented 2 years ago

Here is the solution. First, the problem only occurs when running on the GPU. Second, to fix, update torch function .../torch/nn/utils/rnn.py. Replace lengths by lengths.cpu() in the call to _VF._pack_padded_sequence. I got this information from https://github.com/pytorch/pytorch/issues/43227 .

zanshuxun commented 2 years ago

Yes, adding .cpu() works for me, too. See https://github.com/shenweichen/DeepCTR-Torch/issues/240 for more details.