ruotianluo / self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
MIT License
991 stars 278 forks source link

Type error while training #250

Closed stephancheng closed 3 years ago

stephancheng commented 3 years ago

@ruotianluo Sorry again, after fixing the file problem, I got an error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

run tools/train.py --cfg configs/fc_rl.yml --id fc_rl DataLoader loading json file: data/cocotalk.json vocab size is 9487 DataLoader loading h5 file: data/cocotalk_fc data/cocotalk_att data/cocotalk_box data/cocotalk_label.h5 max sequence length in data is 16 read 123287 image features assigned 113287 images to split train assigned 5000 images to split val assigned 5000 images to split test Read data: 0.003994464874267578 Save ckpt on exception ... model saved to ./log_fc_rl\model.pth Save ckpt done. Traceback (most recent call last): File "D:\Stephan\Final project\self-critical.pytorch-master\tools\train.py", line 183, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, kwargs) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward return self.module(*inputs[0], *kwargs[0]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(input, kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\modules\loss_wrapper.py", line 45, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:]) File "C:\Users\nckuailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\CaptionModel.py", line 33, in forward return getattr(self, ''+mode)(*args, *kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\AttModel.py", line 160, in _forward output, state = self.get_logprobs_state(it, p_fc_feats, p_att_feats, pp_att_feats, p_att_masks, state) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\AttModel.py", line 167, in get_logprobs_state xt = self.embed(it) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(input, **kwargs) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1484, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

ruotianluo commented 3 years ago

That is wierd. You can now just change D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\AttModel.py line 167 to xt = self.embed(it.long())

But it is not expected to happen.

stephancheng commented 3 years ago

Then I encountered another type error:

Traceback (most recent call last):

File "D:\Stephan\Final project\self-critical.pytorch-master\tools\train.py", line 289, in train(opt)

File "D:\Stephan\Final project\self-critical.pytorch-master\tools\train.py", line 76, in train model = models.setup(opt).cuda()

File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models__init__.py", line 30, in setup model = NewFCModel(opt)

File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\AttModel.py", line 906, in init super(NewFCModel, self).init(opt)

TypeError: super(type, obj): obj must be an instance or subtype of type

ruotianluo commented 3 years ago

Still weird.

If you are using python3, you can replace super(....) with super()

stephancheng commented 3 years ago

Yes I am using python 3.7 It is really weird coz the last error did not appear suddenly and it appear another similar type error in loss.py:

Traceback (most recent call last): File "D:\Stephan\Final project\self-critical.pytorch-master\tools\train.py", line 183, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, kwargs) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward return self.module(*inputs[0], *kwargs[0]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(input, kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\modules\loss_wrapper.py", line 45, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, *kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\modules\losses.py", line 179, in forward output = -input.gather(2, target.unsqueeze(2)).squeeze(2) mask RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #3 'index' in call to _th_gather

ruotianluo commented 3 years ago

What version of pytorch are you using may I ask

On May 11, 2021, at 12:34 PM, stephancheng @.***> wrote:

 Yes I am using python 3.7 It is really weird coz the last error did not appear suddenly and it appear another similar type error in loss.py:

Traceback (most recent call last): File "D:\Stephan\Final project\self-critical.pytorch-master\tools\train.py", line 183, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, kwargs) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward return self.module(*inputs[0], *kwargs[0]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(input, kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\modules\loss_wrapper.py", line 45, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, *kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\modules\losses.py", line 179, in forward output = -input.gather(2, target.unsqueeze(2)).squeeze(2) mask RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #3 'index' in call to _th_gather

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

stephancheng commented 3 years ago

Pytorch version is 1.3.1 CUDA is 10.0

ruotianluo commented 3 years ago

The problem seems to be that integer numpy arrays are converted to inttensor instead of longtensor. I have never seen this before. It may be the problem of pytorch version or because of you are using windoes.

stephancheng commented 3 years ago

May I know the exact versions that you used?

ruotianluo commented 3 years ago

I'm now using 1.8

stephancheng commented 3 years ago

I think it is the problem of window, failed running in pytorch 1.8 with Window but success with pytorch 1.9 in a linux machine. Thanks anyway!