Unable to get gradients from pretrained models due to inplace modification

wuhanstudio commented 2 years ago

🐛 Describe the bug

Hi, Recently, I use pretrained models from torchvision for my research, but I'm unable to get gradients from pretrained models due to inplace modification copy_

Code to reproduce the error:

import torch
import torchvision

# Get pretrained FasterRCNN model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

# Construct input data 
images, boxes, labels = torch.rand(1, 3, 600, 1200), torch.rand(1, 11, 4), torch.randint(1, 91, (4, 11))
images.requires_grad = True

# Make sure these are valid boxes x2 > x1 & y2 > y1
boxes[:, :, 2] = boxes[:, :, 0] * 2
boxes[:, :, 3] = boxes[:, :, 1] * 2

targets = []
for i in range(len(images)):
    d = {}
    d['boxes'] = boxes[i]
    d['labels'] = labels[i]
    targets.append(d)

output = model(images, targets)
output['loss_classifier'].backward()

images.grad.data

Error traceback in torchvision v0.10.1+cpu:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-1da46601a617> in <module>
     20     targets.append(d)
     21 
---> 22 output = model(images, targets)
     23 output['loss_classifier'].backward()
     24 

d:\python37\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

d:\python37\lib\site-packages\torchvision\models\detection\generalized_rcnn.py in forward(self, images, targets)
     75             original_image_sizes.append((val[0], val[1]))
     76 
---> 77         images, targets = self.transform(images, targets)
     78 
     79         # Check for degenerate boxes

d:\python37\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

d:\python37\lib\site-packages\torchvision\models\detection\transform.py in forward(self, images, targets)
    118 
    119         image_sizes = [img.shape[-2:] for img in images]
--> 120         images = self.batch_images(images, size_divisible=self.size_divisible)
    121         image_sizes_list: List[Tuple[int, int]] = []
    122         for image_size in image_sizes:

d:\python37\lib\site-packages\torchvision\models\detection\transform.py in batch_images(self, images, size_divisible)
    222         batched_imgs = images[0].new_full(batch_shape, 0)
    223         for img, pad_img in zip(images, batched_imgs):
--> 224             pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
    225 
    226         return batched_imgs

RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

Error traceback in torchvision v0.11.2+cpu

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-21-48cdef5bd896> in <module>
     22 output['loss_classifier'].backward()
     23 
---> 24 images[0].grad.data

AttributeError: 'NoneType' object has no attribute 'data'

Versions

PyTorch version: 1.10.1+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A

Python version: 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19041-SP0 Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] mypy==0.790 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.18.4 [pip3] numpy-ml==0.1.2 [pip3] torch==1.10.1 [pip3] torchaudio==0.9.1 [pip3] torchfile==0.1.0 [pip3] torchnet==0.0.4 [pip3] torchvision==0.11.2 [conda] Could not collect

cc @datumbox

datumbox commented 2 years ago

@wuhanstudio I can't reproduce on the latest main of TorchVision. Could you please check?

wuhanstudio commented 2 years ago

Hi, @datumbox,

Thanks for your kindly help.

I just installed both torch and torchvision from source. Indeed, this issue has been resolved in the main branch by:

https://github.com/pytorch/vision/blob/aef9964e73992268839943caa526c7525be13027/torchvision/models/detection/transform.py#L202-L209

But I notice that this is resolved by a workaround for another onnx issue. If the following line is restored in the future when it is supported by onnx, the issue of gradient retrieval may reappear.

# pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)

Thanks.

datumbox commented 2 years ago

We plan to refactor this code on the near future, so we can resolve it there. I'm going to close the issue as I believe the matter is resolved, nevertheless if you still have concerns feel free to reopen. Thanks!

pytorch / vision

Unable to get gradients from pretrained models due to inplace modification #5217

🐛 Describe the bug

Versions