voldemortX / pytorch-auto-drive

PytorchAutoDrive: Segmentation models (ERFNet, ENet, DeepLab, FCN...) and Lane detection models (SCNN, RESA, LSTR, LaneATT, BézierLaneNet...) based on PyTorch with fast training, visualization, benchmarking & deployment help
BSD 3-Clause "New" or "Revised" License
837 stars 137 forks source link

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 128, 45, 80]], which is output 0 of ReluBackward0, is at version 20; expected version 0 instead. #121

Closed mengxia1994 closed 1 year ago

mengxia1994 commented 1 year ago

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 128, 45, 80]], which is output 0 of ReluBackward0, is at version 20; expected version 0 instead.

mengxia1994 commented 1 year ago

have checked many places, try to change some a += b to a = a + b and so on. USing resa resnet18

voldemortX commented 1 year ago

@mengxia1994 do you mean some a += b needs to be changed to a = a + b? Could you tell me which line of code?

voldemortX commented 1 year ago

@mengxia1994 Is this the same issue as #81 ? Try pytorch 1.6.0 maybe

voldemortX commented 1 year ago

I think the torch inplace grad restrictions are tightened after 1.8.0. We might need a BC-Break change here, change all the add_ to add.

mengxia1994 commented 1 year ago

Thank you, I tried torch 1.6.0. and 1.7.0 and found it is not suitable for 3060 ti gpu...... I will try on another machine. Are there any other method to solve this?

voldemortX commented 1 year ago

@mengxia1994 Maybe try replace all .add_( to .add(

voldemortX commented 1 year ago

A note here: maybe it is time to remove all inplace ops in SCNN & RESA.

PannenetsF commented 1 year ago

Hi, I just implement a non-inplace version of SCNN and RESA, which get the same output of the previous version.Do you want a PR about it?

voldemortX commented 1 year ago

Hi, I just implement a non-inplace version of SCNN and RESA, which get the same output of the previous version.Do you want a PR about it?

Sounds great! Lets have that PR. Could you check if the gradient behavior is also the same as before? Then it may not need to be a BC-Break, and the ONNX conversion codes could be simplified as well.