Error when run test.py - Githubissues

haofanwang commented 3 years ago

Hi, @jacobswan1

https://github.com/jacobswan1/Video2Commonsense/blob/c5b5d807508923577b642fa037d2d5c39bcb7ba4/test.py#L46 throws an error "AttributeError: 'list' object has no attribute 'cuda'". If I remove this line, the error becomes "IndexError: tensors used as indices must be long, byte or bool tensors".

Could you check with it?

jacobswan1 commented 3 years ago

Hi Haofan,

cms_list is just used for evaluating the CMS, simply commenting on this line would address the error. I'll update a new version of the repo as well. Thanks for pointing this out.

Yours, Zhiyuan

haofanwang commented 3 years ago

Thanks for your quick reply, @jacobswan1.

As I have said before, I have tried to comment on the line, but it still does not work. Below is the full errors.

(icml) haofan@demeter:~/kaust/Video2Commonsense$ python test.py --cms 'int' --batch_size 64 --num_layer 6 --dim_head 64 --dim_inner 1024 --num_head 8 --dim_vis_feat 2048 --dropout 0.1 --rnn_layer 1 --checkpoint_path ./save --info_json data/v2c_info.json --caption_json data/V2C_MSR-VTT_caption.json --load_checkpoint save/CMS_CAP_MODEL_INT_lr_0.044_BS_128_Layer_6_ATTHEAD_8_HID_512_RNNLayer_1/CMS_CAP_MODEL_INT_lr_0.044_BS_128_Layer_6_ATTHEAD_8_HID_512_RNNLayer_1_epoch_20.pth --cuda Caption vocab size is 29326 CMS vocab size is 26813 number of train videos: 6819 number of test videos: 2903 number of val videos: 0 load feats from ['data/feats/resnet152/'] max sequence length of caption is 28 /home/haofan/anaconda3/envs/icml/lib/python3.9/site-packages/torch/nn/modules/rnn.py:58: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 warnings.warn("dropout option adds dropout after all but last " 69221376 Traceback (most recent call last): File "/home/haofan/kaust/Video2Commonsense/test.py", line 157, in main(opt) File "/home/haofan/kaust/Video2Commonsense/test.py", line 148, in main test(dataloader, model, opt, dataset.get_cap_vocab(), dataset.get_cms_vocab()) File "/home/haofan/kaust/Video2Commonsense/test.py", line 57, in test cms_batch_hyp = translate_batch(model, fc_feats, cap_labels, opt) File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 145, in translate_batch active_inst_idx_list = beam_decode_step( File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 94, in beam_decode_step dec_seq = prepare_beam_dec_seq(inst_dec_beams, len_dec_seq) File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 59, in prepare_beam_dec_seq dec_partial_seq = [b.get_current_state() for b in inst_dec_beams if not b.done] File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 59, in dec_partial_seq = [b.get_current_state() for b in inst_dec_beams if not b.done] File "/home/haofan/kaust/Video2Commonsense/model/transformer/Beam.py", line 34, in get_current_state return self.get_tentative_hypothesis() File "/home/haofan/kaust/Video2Commonsense/model/transformer/Beam.py", line 91, in get_tentative_hypothesis hyps = [self.get_hypothesis(k) for k in keys] File "/home/haofan/kaust/Video2Commonsense/model/transformer/Beam.py", line 91, in hyps = [self.get_hypothesis(k) for k in keys] File "/home/haofan/kaust/Video2Commonsense/model/transformer/Beam.py", line 101, in get_hypothesis hyp.append(self.next_ys[j+1][k]) IndexError: tensors used as indices must be long, byte or bool tensors

I also tried the pretrained model you provided, the error is the same. If I cast the k into long type, the error will become another one.

jacobswan1 commented 3 years ago

Hi Haofan,

this looks a bit strange as I also never met this error before.

Could you tell me what is the PyTorch version you are using now? And if debugging, what's the value of K?

And since you said after casting K into long type another error pops out, could you show me this error as well?

Thanks.

haofanwang commented 3 years ago

The error is kind of strange. I installed the newest Pytorch (torch==1.7.1+cu110). My Python is 3.9, but I don't think it matters.

When I print out the value of k in Beam.py, I get tensor(0., device='cuda:0'), which is a float type. If I add k=k.long() in the code, the error log becomes more complex as below.

/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [54,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed.

Traceback (most recent call last): File "test.py", line 156, in main(opt) File "test.py", line 147, in main test(dataloader, model, opt, dataset.get_cap_vocab(), dataset.get_cms_vocab()) File "test.py", line 56, in test cms_batch_hyp = translate_batch(model, fc_feats, cap_labels, opt) File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 146, in translate_batch inst_dec_beams, len_dec_seq, src_seq, src_enc, inst_idx_to_position_map, n_bm, mode='int') File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 99, in beam_decode_step active_inst_idx_list = collect_active_inst_idx_list(inst_dec_beams, word_prob, inst_idx_to_position_map) File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 86, in collect_active_inst_idx_list is_inst_complete = inst_beams[inst_idx].advance(word_prob[inst_position]) File "/home/haofan/kaust/Video2Commonsense/model/transformer/Beam.py", line 70, in advance if self.next_ys[-1][0].item() == Constants.EOS: RuntimeError: CUDA error: device-side assert triggered

haofanwang commented 3 years ago

@jacobswan1, What is your environment? If possible, could you provide the requirement.txt? I think it may cause by the version of some packages.

If you are familiar with Google Colab, it would be much better to provide a Colab link to run the test on pre-trained model.

jacobswan1 commented 3 years ago

@haofanwang I guess it is possible to be caused by the difference of PyTorch Ver

@jacobswan1, What is your environment? If possible, could you provide the requirement.txt? I think it may cause by the version of some packages.

If you are familiar with Google Colab, it would be much better to provide a Colab link to run the test on pre-trained model.

I was building this using an old anaconda env of mine, which uses PyTorch Version 1.1.0, I guess the dependent packages are not likely to trigger this error except the PyTorch version. Probably try to downgrade the PyTorch version to see if it run?

On the other hand I'll try to immigrant the codes to a newer PyTorch version and update it, and also I'll take a look to see if I can put the evaluations on Colab. But as I'm having some in-hand tasks in hand I'll probably have to spend some times on that (by the end of the weekend or Monday)

haofanwang commented 3 years ago

Sure. Thanks for helping!

I will try Pytorch 1.1.0 and see whether it works.

haofanwang commented 3 years ago

Hi, @jacobswan1,

It works now. Indeed, it is a Pytorch version problem. Thanks.

jacobswan1 commented 3 years ago

Hi Haofan, glad it works! I'll still work on to adapt it to a newer version later, thanks for your feedback very much.

Jeff-LiangF commented 2 years ago

Hey @jacobswan1 ,

Thanks for your great repo! I encountered the same issue as @haofanwang with Pytorch 1.7.1 and I solved it by replacing prev_k = best_scores_id / num_words with prev_k = torch.div(best_scores_id, num_words, rounding_mode='floor') here. :)

jacobswan1 / Video2Commonsense

Error when run test.py #3