linjieli222 / HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
https://arxiv.org/abs/2005.00200
MIT License
230 stars 34 forks source link

some questions about inplace #23

Closed y1129800378 closed 3 years ago

y1129800378 commented 3 years ago

when i run train_vcmr.py use my own env(pytorch 1.2),get An error:

y1129800378 commented 3 years ago

when i run train_vcmr.py use my own env(pytorch 1.2),get An error: error:

Traceback (most recent call last): File "train_vcmr.py", line 403, in main(args) File "train_vcmr.py", line 232, in main scaled_loss.backward() File "/data/cdp_algo_ceph_ssd/users/yuyangyin/uniter/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/data/cdp_algo_ceph_ssd/users/yuyangyin/uniter/lib/python3.6/site-packages/torch/autograd/init.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [74, 768]], which is output 0 of SelectBackward, is at version 652; expected version 651 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

it look like this: https://discuss.pytorch.org/t/solved-pytorch1-5-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/90256

It is caused by model/model.py L212【transformed_c_v_feats = transformed_c_v_feats + matched_v_feats】

So I am worried there will be problems, and how to fix it? thanks~

Unified-Robots commented 3 years ago

@y1129800378 This may be related with your pytorch version. I used to use pytorch 1.2, and faced with nearly the same problem. I'm now using pytorch 1.7, and do not have this issue for both pre-training and fine-tuning.