Error when running run_dien.py

erlebach commented 2 years ago

After installation of DeepCTR via python setup.py install, I get the following error:

cuda ready...
cuda:0
Train on 4 samples, validate on 0 samples, 2 steps per epoch
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "run_dien.py", line 68, in <module>
    history = model.fit(x, y, batch_size=2, epochs=10, verbose=1, validation_split=0, shuffle=False)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/basemodel.py", line 245, in fit
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/dien.py", line 92, in forward
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/dien.py", line 220, in forward
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 249, in pack_padded_sequence
    _VF._pack_padded_sequence(input, lengths, batch_first)

Any help is appreciated.

zanshuxun commented 2 years ago

There should be exact error message at the end of the traceback text. Maybe the error message is incomplete.
Please provide more infomations for us to reproduce:

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Operating environment:

python version [e.g. 3.5, 3.6]
torch version [e.g. 1.6.0, 1.7.0]
deepctr-torch version [e.g. 0.2.7,]

Additional context Add any other context about the problem here.

erlebach commented 2 years ago

Thanks for the reply. Here is more detail. I am running on Pop!Os 22.04. Here is the output of "uname -a":

Linux pop-os 5.17.5-76051705-generic #202204271406~1655476786~22.04~62dd706 SMP PREEMPT Fri Jun 17 16 x86_64 x86_64 x86_64 GNU/Linux

with the following versions of torch libraries:

torch                        1.10.0+cu113
torch-cluster                1.5.9
torch-geometric              2.0.1
torch-scatter                2.0.9
torch-sparse                 0.6.12
torch-spline-conv            1.2.1

I get an error trace (note that I get no error when running `dien.py') with the command:

python run_dien.py

I get the following error trace:

cuda ready...
cuda:0
Train on 4 samples, validate on 0 samples, 2 steps per epoch
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "run_dien.py", line 68, in <module>
    history = model.fit(x, y, batch_size=2, epochs=10, verbose=1, validation_split=0, shuffle=False)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/basemodel.py", line 245, in fit
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/dien.py", line 92, in forward
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/deepctr_torch-0.2.8-py3.8.egg/deepctr_torch/models/dien.py", line 220, in forward
  File "/home/erlebach/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 249, in pack_padded_sequence
    _VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

For reference, my GPU is NVIDIA GeForce RTX 3080 Ti, with Cuda 11. 4, and driver 470.103.01.

erlebach commented 2 years ago

Here is the solution. First, the problem only occurs when running on the GPU. Second, to fix, update torch function .../torch/nn/utils/rnn.py. Replace lengths by lengths.cpu() in the call to _VF._pack_padded_sequence. I got this information from https://github.com/pytorch/pytorch/issues/43227 .

zanshuxun commented 2 years ago

Yes, adding .cpu() works for me, too. See https://github.com/shenweichen/DeepCTR-Torch/issues/240 for more details.

shenweichen / DeepCTR-Torch

Error when running run_dien.py #249