If you do not modify anything and run this program. The above line would cause:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 32, 3, 3]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Actually I have come into this annoying bug for several times using my PyTorch 1.8, CUDA 11.2 and NVIDIA RTX 3090. I suppose that after calling "first_MaskL1Loss.backward(retain_graph)" and "optimizer_g1.step()", the first_MaskL1Loss cannot be further used in the above total loss. I don't know the exact reason but I managed to run the code error-free by the following modification:
# Generator output
for repeated_idx in range(2): # Added, when repeated_idx==0, call first_MaskL1Loss.backward(), otherwise skip it
first_out, second_out = generator(img, mask)
...
...
if repeated_idx % 2==0:
optimizer_g1.zero_grad()
first_MaskL1Loss.backward() # retain_graph=True
optimizer_g1.step()
# the rest is not modified
optimizer_g.zero_grad()
# Get the deep semantic feature maps, and compute Perceptual Loss
img_featuremaps = perceptualnet(img) # feature maps
Another benefit of the above modification is that retain_graph=True is no longer required so that GPU memory can be saved.
Another issue: this project seems to cause error using DDP.
In trainer.py, Line 170, there is:
If you do not modify anything and run this program. The above line would cause:
Actually I have come into this annoying bug for several times using my PyTorch 1.8, CUDA 11.2 and NVIDIA RTX 3090. I suppose that after calling "first_MaskL1Loss.backward(retain_graph)" and "optimizer_g1.step()", the first_MaskL1Loss cannot be further used in the above total loss. I don't know the exact reason but I managed to run the code error-free by the following modification:
Another benefit of the above modification is that retain_graph=True is no longer required so that GPU memory can be saved.
Another issue: this project seems to cause error using DDP.