Closed haofanwang closed 3 years ago
Hi Haofan,
cms_list is just used for evaluating the CMS, simply commenting on this line would address the error. I'll update a new version of the repo as well. Thanks for pointing this out.
Yours, Zhiyuan
Thanks for your quick reply, @jacobswan1.
As I have said before, I have tried to comment on the line, but it still does not work. Below is the full errors.
(icml) haofan@demeter:~/kaust/Video2Commonsense$ python test.py --cms 'int' --batch_size 64 --num_layer 6 --dim_head 64 --dim_inner 1024 --num_head 8 --dim_vis_feat 2048 --dropout 0.1 --rnn_layer 1 --checkpoint_path ./save --info_json data/v2c_info.json --caption_json data/V2C_MSR-VTT_caption.json --load_checkpoint save/CMS_CAP_MODEL_INT_lr_0.044_BS_128_Layer_6_ATTHEAD_8_HID_512_RNNLayer_1/CMS_CAP_MODEL_INT_lr_0.044_BS_128_Layer_6_ATTHEAD_8_HID_512_RNNLayer_1_epoch_20.pth --cuda Caption vocab size is 29326 CMS vocab size is 26813 number of train videos: 6819 number of test videos: 2903 number of val videos: 0 load feats from ['data/feats/resnet152/'] max sequence length of caption is 28 /home/haofan/anaconda3/envs/icml/lib/python3.9/site-packages/torch/nn/modules/rnn.py:58: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 warnings.warn("dropout option adds dropout after all but last " 69221376 Traceback (most recent call last): File "/home/haofan/kaust/Video2Commonsense/test.py", line 157, in
main(opt) File "/home/haofan/kaust/Video2Commonsense/test.py", line 148, in main test(dataloader, model, opt, dataset.get_cap_vocab(), dataset.get_cms_vocab()) File "/home/haofan/kaust/Video2Commonsense/test.py", line 57, in test cms_batch_hyp = translate_batch(model, fc_feats, cap_labels, opt) File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 145, in translate_batch active_inst_idx_list = beam_decode_step( File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 94, in beam_decode_step dec_seq = prepare_beam_dec_seq(inst_dec_beams, len_dec_seq) File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 59, in prepare_beam_dec_seq dec_partial_seq = [b.get_current_state() for b in inst_dec_beams if not b.done] File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 59, in dec_partial_seq = [b.get_current_state() for b in inst_dec_beams if not b.done] File "/home/haofan/kaust/Video2Commonsense/model/transformer/Beam.py", line 34, in get_current_state return self.get_tentative_hypothesis() File "/home/haofan/kaust/Video2Commonsense/model/transformer/Beam.py", line 91, in get_tentative_hypothesis hyps = [self.get_hypothesis(k) for k in keys] File "/home/haofan/kaust/Video2Commonsense/model/transformer/Beam.py", line 91, in hyps = [self.get_hypothesis(k) for k in keys] File "/home/haofan/kaust/Video2Commonsense/model/transformer/Beam.py", line 101, in get_hypothesis hyp.append(self.next_ys[j+1][k]) IndexError: tensors used as indices must be long, byte or bool tensors
I also tried the pretrained model you provided, the error is the same. If I cast the k into long type, the error will become another one.
Hi Haofan,
this looks a bit strange as I also never met this error before.
Could you tell me what is the PyTorch version you are using now? And if debugging, what's the value of K?
And since you said after casting K into long type another error pops out, could you show me this error as well?
Thanks.
The error is kind of strange. I installed the newest Pytorch (torch==1.7.1+cu110). My Python is 3.9, but I don't think it matters.
When I print out the value of k
in Beam.py
, I get tensor(0., device='cuda:0')
, which is a float type. If I add k=k.long()
in the code, the error log becomes more complex as below.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [54,0,0], thread: [64,0,0] Assertion
srcIndex < srcSelectDimSize
failed.Traceback (most recent call last): File "test.py", line 156, in
main(opt) File "test.py", line 147, in main test(dataloader, model, opt, dataset.get_cap_vocab(), dataset.get_cms_vocab()) File "test.py", line 56, in test cms_batch_hyp = translate_batch(model, fc_feats, cap_labels, opt) File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 146, in translate_batch inst_dec_beams, len_dec_seq, src_seq, src_enc, inst_idx_to_position_map, n_bm, mode='int') File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 99, in beam_decode_step active_inst_idx_list = collect_active_inst_idx_list(inst_dec_beams, word_prob, inst_idx_to_position_map) File "/home/haofan/kaust/Video2Commonsense/model/transformer/cap2cms_Translator.py", line 86, in collect_active_inst_idx_list is_inst_complete = inst_beams[inst_idx].advance(word_prob[inst_position]) File "/home/haofan/kaust/Video2Commonsense/model/transformer/Beam.py", line 70, in advance if self.next_ys[-1][0].item() == Constants.EOS: RuntimeError: CUDA error: device-side assert triggered
@jacobswan1, What is your environment? If possible, could you provide the requirement.txt? I think it may cause by the version of some packages.
If you are familiar with Google Colab, it would be much better to provide a Colab link to run the test on pre-trained model.
@haofanwang I guess it is possible to be caused by the difference of PyTorch Ver
@jacobswan1, What is your environment? If possible, could you provide the requirement.txt? I think it may cause by the version of some packages.
If you are familiar with Google Colab, it would be much better to provide a Colab link to run the test on pre-trained model.
I was building this using an old anaconda env of mine, which uses PyTorch Version 1.1.0, I guess the dependent packages are not likely to trigger this error except the PyTorch version. Probably try to downgrade the PyTorch version to see if it run?
On the other hand I'll try to immigrant the codes to a newer PyTorch version and update it, and also I'll take a look to see if I can put the evaluations on Colab. But as I'm having some in-hand tasks in hand I'll probably have to spend some times on that (by the end of the weekend or Monday)
Sure. Thanks for helping!
I will try Pytorch 1.1.0 and see whether it works.
Hi, @jacobswan1,
It works now. Indeed, it is a Pytorch version problem. Thanks.
Hi Haofan, glad it works! I'll still work on to adapt it to a newer version later, thanks for your feedback very much.
Hey @jacobswan1 ,
Thanks for your great repo! I encountered the same issue as @haofanwang with Pytorch 1.7.1 and I solved it by replacing prev_k = best_scores_id / num_words
with prev_k = torch.div(best_scores_id, num_words, rounding_mode='floor')
here. :)
Hi, @jacobswan1
https://github.com/jacobswan1/Video2Commonsense/blob/c5b5d807508923577b642fa037d2d5c39bcb7ba4/test.py#L46 throws an error "AttributeError: 'list' object has no attribute 'cuda'". If I remove this line, the error becomes "IndexError: tensors used as indices must be long, byte or bool tensors".
Could you check with it?