jackroos / VL-BERT

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
MIT License
735 stars 110 forks source link

Unrecognized tensor type ID: AutogradCUDA #61

Closed jkkishore1999 closed 3 years ago

jkkishore1999 commented 3 years ago

When I was running the following script for fine-tuning on refcoco, _! bash ./scripts/nondist_run.sh refcoco/train_end2end.py 'cfgs/refcoco/base_gt_boxes_4x16G.yaml' refcoco_base_gtckpt

I enountered the following error.

[Partial Load] non matched keys: ['object_mask_word_embedding.weight', 'aux_text_visual_embedding.weight', 'vlbert.mlm_head.predictions.bias', 'vlbert.mlm_head.predictions.transform.dense.weight', 'vlbert.mlm_head.predictions.transform.dense.bias', 'vlbert.mlm_head.predictions.transform.LayerNorm.weight', 'vlbert.mlm_head.predictions.transform.LayerNorm.bias', 'vlbert.mlm_head.predictions.decoder.weight', 'vlbert.mvrc_head.region_cls_pred.weight', 'vlbert.mvrc_head.region_cls_pred.bias'] [Partial Load] non pretrain keys: ['final_mlp.2.weight', 'final_mlp.2.bias'] PROGRESS: 0.00% /content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/fast_rcnn.py:136: UserWarning: This overload of nonzero is deprecated: nonzero() Consider using one of the following signatures instead: nonzero(, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.) box_inds = box_mask.nonzero() Traceback (most recent call last): File "refcoco/train_end2end.py", line 60, in main() File "refcoco/train_end2end.py", line 54, in main rank, model = train_net(args, config) File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../refcoco/function/train.py", line 323, in train_net gradient_accumulate_steps=config.TRAIN.GRAD_ACCUMULATE_STEPS) File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/trainer.py", line 115, in train outputs, loss = net(batch) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/module.py", line 22, in forward return self.train_forward(*inputs, *kwargs) File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../refcoco/modules/resnet_vlbert_for_refcoco.py", line 96, in train_forward segms=None) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/fast_rcnn.py", line 149, in forward roi_align_res = self.roi_align(img_feats['body4'], rois).type(images.dtype) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/lib/roi_pooling/roi_align.py", line 69, in forward input.float(), rois.float(), self.output_size, self.spatial_scale, self.sampling_ratio File "/content/gdrive/My Drive/DDP/VL-BERT/refcoco/../common/lib/roi_pooling/roi_align.py", line 20, in forward input, rois, spatial_scale, output_size[0], output_size[1], sampling_ratio RuntimeError: Unrecognized tensor type ID: AutogradCUDA**

I am running on google colab, using pytorch version 1.7.0 , torchvision 0.8.1 and cuda 10.1. Same error is coming with cuda 9.2 also. When I use pytorch 1.1.0 as mentioned in the readme, many errors related to modules in torchvision are coming. Please help I am in urgent need of this to complete my project

G-Apple1 commented 3 years ago

Is it solved? I also encountered this problem

jkkishore1999 commented 3 years ago

I could not solve it yet. Please let me know if you have any solution

jackroos commented 3 years ago

@jkkishore1999 @G-Apple1 Could you provide more information about your environment, especially the version of GCC, CUDA and pytorch?

jkkishore1999 commented 3 years ago

@jkkishore1999 @G-Apple1 Could you provide more information about your environment, especially the version of GCC, CUDA and pytorch?

I am running on google colab, using pytorch version 1.7.0 , torchvision 0.8.1 and cuda 10.1. Same error is coming with cuda 9.2 also. When I use pytorch 1.1.0 as mentioned in the readme, many errors related to modules in torchvision are coming. Please help I am in urgent need of this to complete my project

G-Apple1 commented 3 years ago

@jkkishore1999 @G-Apple1 Could you provide more information about your environment, especially the version of GCC, CUDA and pytorch?

I am running on google colab, using pytorch version 1.7.0 , torchvision 0.8.1 and cuda 10.1. Same error is coming with cuda 9.2 also. When I use pytorch 1.1.0 as mentioned in the readme, many errors related to modules in torchvision are coming. Please help I am in urgent need of this to complete my project

me too

jackroos commented 3 years ago

@jkkishore1999 @G-Apple1 I havn't test the code on torch 1.7.0. I think you need to use torch 1.1.0. And you should use the corresponding version of torchvision (e.g., 0.3.0, see this page for the correspondence).