shenweichen / DeepCTR-Torch

【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.
https://deepctr-torch.readthedocs.io/en/latest/index.html
Apache License 2.0
2.95k stars 696 forks source link

Getting "RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor" #240

Closed Jeriousman closed 1 year ago

Jeriousman commented 2 years ago

Describe the bug(问题描述) history = model.fit(x, y, batch_size=256, epochs=20, verbose=1, validation_split=0.4, shuffle=True) When I try model.fit for DIEN model with run_dien.py of your default example, it works when I set device to cpu but with cuda, I get this error below.

cuda ready...
0it [00:00, ?it/s]cuda:0
Train on 4 samples, validate on 0 samples, 2 steps per epoch

Traceback (most recent call last):

  File "<ipython-input-1-e985ce1c0aa2>", line 69, in <module>
    history = model.fit(x, y, batch_size=2, epochs=10, verbose=1, validation_split=0, shuffle=False)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/deepctr_torch/models/basemodel.py", line 244, in fit
    y_pred = model(x).squeeze()

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/deepctr_torch/models/dien.py", line 92, in forward
    masked_interest, aux_loss = self.interest_extractor(keys_emb, keys_length, neg_keys_emb)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/deepctr_torch/models/dien.py", line 221, in forward
    enforce_sorted=False)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 244, in pack_padded_sequence
    _VF._pack_padded_sequence(input, lengths, batch_first)

RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

So I tried lengths.cpu(), lengths.to('cpu') and all of them couldnt solve the problem. Can you please provide a solution?

Operating environment(运行环境):

zanshuxun commented 2 years ago

In the new version of PyTorch, the input parameter lengths of torch.nn.utils.rnn.pack_padded_sequence has been changed: (Details can be found in https://github.com/pytorch/pytorch/issues/43227)

image

image

Jeriousman commented 2 years ago

Obviously I tried. But as I said, none of them worked. But I had to get way down to torch 1.4.0 to get it done.

zanshuxun commented 2 years ago

Obviously I tried. But as I said, none of them worked. But I had to get way down to torch 1.4.0 to get it done.

Where did you use .cpu()? Did the device of the tensor change after you use .cpu()?

Jeriousman commented 2 years ago

Yes. I did. as I mentioned below.

So I tried lengths.cpu(), lengths.to('cpu') and all of them couldnt solve the problem

The length part is the one I tried to put into cpu as the exact same persons mruberry and ngimel suggested. That was the first web page I found as well when I was trying to fix the problem.

zanshuxun commented 2 years ago
  1. Where did you use .cpu()?

Could you tell me the corresponding line number in the code? for example:

https://github.com/shenweichen/DeepCTR-Torch/blob/b4d8181e86c2165722fa9331bc16185832596232/deepctr_torch/models/dien.py#L220-L221

Did you set masked_keys_length.cpu() here?

or other places like https://github.com/shenweichen/DeepCTR-Torch/blob/b4d8181e86c2165722fa9331bc16185832596232/deepctr_torch/models/dien.py#L356

or

https://github.com/shenweichen/DeepCTR-Torch/blob/b4d8181e86c2165722fa9331bc16185832596232/deepctr_torch/models/dien.py#L365

  1. Did the device of the tensor change after you use .cpu()?

Could you print the device of the tensor before and after your .cpu()? To figure out whether it works. If it works, there should not be the error "RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor"

Jeriousman commented 2 years ago

Hello. I have done for all the pack_padded_sequences for example, masked_keys_length.cpu(). When I did this, it was converted to cpu one. But the error was still there. For me, only downgrading torch version worked. It is strange tho. That was the whole point of the question. It became CPU tensor, but it didnt work. Is it working on your side?

zanshuxun commented 2 years ago

@Jeriousman I add .cpu() in all the pack_padded_sequence(...) in dien.py, then it works. Maybe you missed something. Could you paste the traceback info and your dien.py file?

umanniyaz commented 1 year ago

hi any one tell me same error on torch==1.8.0 , how to handle this