Open cnnAndBn opened 3 years ago
hi,now I install torch in 1.1 , but another error reported in 'openseg.pytorch-master/lib/extensions/inplace_abn_1/functions.py", line 208, in backward ' z, var, weight, bias = ctx.saved_tensors RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 320, 16, 16]], which is output 0 of InPlaceABNSyncBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
hi @PkuRainBow @hsfzxjy @LayneH now I am using my own data to train a segfix model to do the post processing. as for the torch version , in your config files, you use the inplace_abn for all bn . and in
the BN implementation, it seems only 0.4,1.0.1.1,1.2 torch version are qualified. Can I use torch 1.5 or higher. another question is "syncbn" is ok for the segfix model? have you compared it with the inplace_abn.
We have a branch https://github.com/openseg-group/openseg.pytorch/tree/pytorch-1.7 , in which some of the OCR supports SyncBN and distributed training on PyTorch1.7. You may try it out with SegFix.
how to fix torch.autograd.set_detect_anomaly(True). bug? I just modifid the environment. @hsfzxjy
how to fix torch.autograd.set_detect_anomaly(True). bug? I just modifid the environment. @hsfzxjy
I dont know what's going on with your code. The error means invalid inplace operations happened somewhere. To locate the operations, you can add torch.autograd.set_detch_anomaly(True)
in main.py
, just before if __name__ == "__main__":
. Then re-run the code, and post up the full traceback. By this we can check how to solve it.
@hsfzxjy sys:1: RuntimeWarning: Traceback of forward call that caused the error: File "/root/.local/conda/envs/mmdet-2.8/lib/python3.7/threading.py", line 890, in _bootstrap self._bootstrap_inner() File "/root/.local/conda/envs/mmdet-2.8/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/root/.local/conda/envs/mmdet-2.8/lib/python3.7/threading.py", line 870, in run self._target(*self._args, self._kwargs) File "/root/.local/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, *kwargs) File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/root/myWorkBase/code/openseg.pytorch-master/lib/models/nets/segfix.py", line 76, in forward x = self.backbone(x_) File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/root/myWorkBase/code/openseg.pytorch-master/lib/models/backbones/hrnet/hrnet_backbone.py", line 735, in forward x_list.append(self.transition3i) File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/root/myWorkBase/code/openseg.pytorch-master/lib/extensions/inplace_abn_1/bn.py", line 118, in forward self.training, self.momentum, self.eps, self.activation, self.slope)
Traceback (most recent call last):
File "../main.py", line 230, in
@LayneH another question is if use the new branch https://github.com/openseg-group/openseg.pytorch/tree/pytorch-1.7, the config option is consistent with the old one? in other words, don't need to modify the json and input argument parameters?
@dadada101 You can try modify all nn.ReLU(inplace=True)
to nn.ReLU(inplace=False)
in https://github.com/openseg-group/openseg.pytorch/blob/master/lib/models/backbones/hrnet/hrnet_backbone.py . This should solve the problem.
@hsfzxjy Excuse me, I have a question for you.When I run scripts/cityscapes/segfix/run_h_48_d_4_segfix.sh train 1
, All the errors are 0, I don't know where I went wrong, thank you for your advice
hi @PkuRainBow @hsfzxjy @LayneH now I am using my own data to train a segfix model to do the post processing. as for the torch version , in your config files, you use the inplace_abn for all bn . and in https://github.com/openseg-group/openseg.pytorch/blob/2c459f3b42deee26194f1802f353887d945e14c4/lib/models/tools/module_helper.py#L77 the BN implementation, it seems only 0.4,1.0.1.1,1.2 torch version are qualified. Can I use torch 1.5 or higher. another question is "syncbn" is ok for the segfix model? have you compared it with the inplace_abn.