jackroos / VL-BERT

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
MIT License
735 stars 110 forks source link

_pickle.UnpicklingError: invalid load key, '-'. #45

Closed liulijie-2020 closed 4 years ago

liulijie-2020 commented 4 years ago

Thanks for your great code. I try to train on the vcr task to see result. when i did python vcr/val.py \ --a-cfg ./cfgs/vcr/base_q2a_4x16G_fp32.yaml --r-cfg ./cfgs/vcr/base_qa2r_4x16G_fp32.yaml \ --a-ckpt ./output/base_q2a_4x16G_fp32.yaml --r-ckpt ./output/base_qa2r_4x16G_fp32.yaml \ --gpus 0 1 \ --result-path ./results/ --result-name eval_vcr, the mistake happened. As follows: warnings.warn('miss keys: {}'.format(miss_keys)) Warnings: Unexpected keys: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.gamma', 'cls.predictions.transform.LayerNorm.beta', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']. Traceback (most recent call last): File "vcr/val.py", line 214, in <module> main() File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad return func(*args, **kwargs) File "vcr/val.py", line 114, in main a_ckpt = torch.load(args.a_ckpt, map_location=lambda storage, loc: storage) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/serialization.py", line 387, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/serialization.py", line 564, in _load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '-'. I hope can get some help to solve the problem. Thanks a lot.

jackroos commented 4 years ago

Please follow the instructions in readme to fine-tune VL-BERT on VCR first, then you can do evaluation on it. Thank you!

liulijie-2020 commented 4 years ago

Please follow the instructions in readme to fine-tune VL-BERT on VCR first, then you can do evaluation on it. Thank you!

thank you for your reply.And what you mean about fine-tune is the step about training part of readme?

jackroos commented 4 years ago

Yes.

liulijie-2020 commented 4 years ago

Yes.

thank you for your kindness help.i have done this part and got some files about Params in this part. but then `PROGRESS: 0.00% Traceback (most recent call last): File "vcr/train_end2end.py", line 59, in main() File "vcr/train_end2end.py", line 53, in main rank, model = train_net(args, config) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../vcr/function/train.py", line 337, in train_net gradient_accumulate_steps=config.TRAIN.GRAD_ACCUMULATE_STEPS) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/trainer.py", line 115, in train outputs, loss = net(batch) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, *kwargs) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/module.py", line 22, in forward return self.train_forward(*inputs, kwargs) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../vcr/modules/resnet_vlbert_for_vcr.py", line 261, in train_forward segms=segms) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/fast_rcnn.py", line 149, in forward roi_align_res = self.roi_align(img_feats['body4'], rois).type(images.dtype) File "/home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/roi_align.py", line 69, in forward input.float(), rois.float(), self.output_size, self.spatial_scale, self.sampling_ratio File "/home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/roi_align.py", line 20, in forward input, rois, spatial_scale, output_size[0], output_size[1], sampling_ratio RuntimeError: Not compiled with GPU support (ROIAlign_forward at /home/songzijie/project/VLbert/VL-BERT-master/common/lib/roi_pooling/ROIAlign.h:21) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f5c43697dc5 in /home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int) + 0xf6 (0x7f5c2bda8396 in /home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/C_ROIPooling.cpython-36m-x86_64-linux-gnu.so) frame #2: + 0x13f74 (0x7f5c2bdb3f74 in /home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/C_ROIPooling.cpython-36m-x86_64-linux-gnu.so) frame #3: + 0x13ffe (0x7f5c2bdb3ffe in /home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/C_ROIPooling.cpython-36m-x86_64-linux-gnu.so) frame #4: + 0x1138c (0x7f5c2bdb138c in /home/songzijie/project/VLbert/VL-BERT-master/vcr/../common/lib/roi_pooling/C_ROIPooling.cpython-36m-x86_64-linux-gnu.so)

frame #11: THPFunction_apply(_object*, _object*) + 0x691 (0x7f5c696e7081 in /home/songzijie/.conda/envs/vlbert/lib/python3.6/site-packages/torch/lib/libtorch_python.so) ` by the way i check my cuda `python -c 'import torch; from torch.utils.cpp_extension import CUDA_HOME; print(torch.cuda.is_available(), CUDA_HOME)' True /home/share/cuda/cuda-9.0 `
jackroos commented 4 years ago

@liulijie-2020 Did you run the init.sh to compile the operators?

liulijie-2020 commented 4 years ago

@liulijie-2020 Did you run the init.sh to compile the operators?

Yes,i did. running build_ext copying build/lib.linux-x86_64-3.6/C_ROIPooling.cpython-36m-x86_64-linux-gnu.so -> Thanks for your help. I've solved the problem.Reason is the version of scipy ==1.5.1. When changed it to scipy ==1.4.1 and restart, the program went on way.

jackroos commented 4 years ago

Thanks for the information!