Closed ChrisHJC closed 3 years ago
Hi,
I had the exact same problem a few hours ago, and with some coding and debugging i got some results now. My task was to apply Grad-CAM to a Faster R-CNN model to gain some insights on the backbone of my model (ResNet-34). In order to use this nice project I needed to adjust the __call__
functions of ModelOutputs
and GuidedBackpropReLUModel
. I then injected my custom functions into the project and used is as normal.
Customization of the ModelOutputs.__call__()
function was needed because the model of the FasterRCNN architecture contains several additional layers and transformations in comparison to e.g. the VGG architectures. To be able to use all other features of the pytorch_grad_cam package, I customized this call to find the deepest layer of the backbone network, save its
activation and gradient information by calling the feature_extractor
there and then continue with the normal forward pass of the FasterRCNN implementation. To do this i reassembled the original implementation in GeneralizedRCNN.forward()
(which
can be found in torchvision.models.detection.generalized_rcnn.py:44). And here and there got rid of tensors with too much dimension. I did this because I'am only interested in the best prediction for this image.
The GuidedBackpropReLUModel.__call__()
was altered in a similar fashion.
Overall this solution is a simplification, and dismisses a lot of functionality of the FasterRCNN and also reduced the object detection network to a image classification network.
I hope this helps a bit.
Hi @thisismexp could you please share the code you have put together (if possible)? That'd be super useful!
@thisismexp I am trying to implement grad-cam for faster-rcnn (detectron2 with config file faster_rcnn_X_101_32x8d_FPN_3x.yaml). I have a problem. all the elements of the cam (before Relu) are negative. I would like to ask you, why you have used guided backpropagation? do you think my probem is related to using backpropagation and not the guided one?
@hanikh what is the target layer / loss function you are using. Is it for classification, or bounding box regression?
@jacobgil I am using detectron2 and config file "faster_rcnn_X_101_32x8d_FPN_3x.yaml". the target layer is "roi_heads.box_pooler" (this is the layer which pools the boxes from different level feature maps.). I calculate the gradient for the best score and NOT the bounding boxes.
Hi @thisismexp could you please share the code you have put together (if possible)? That'd be super useful!
Unfortunately I cannot share all the code with a broader audience, because I signed a NDA However I will send you some files via mail and hope this helps you out.
@hanikh honestly I have no idea, but maybe this project might help, as they are using detectron2 in their example: Grad-CAM.pytorch
@thisismexp I have seen thatbefore. it didn't help me
I would like to help with this, so some kind of a code snippet to run to help reproduce will be great. Otherwise I will try reproducing myself, but it might take me a few days or more to get to it.
As a side note in case it helps:
Try replacing with a non-gradient method, like ScoreCAM? It might help get insights.
from pytorch_grad_cam import ScoreCAM
Something I'm not sure about - What is the width/height size of the gradients it computes? Can you check if it's the size of the entire image, or the size of the bounding box?
In case it's the entire image - maybe it makes sense it will be negative, since most of the image is not the object. Maybe crop the activations/gradients to the bounding box location, if you aren't already doing it?
@jacobgil may I have your email to share some parts of the code with you?
jacob.gildenblat@gmail.com
Hi, I'm working on this as well and attempted exactly what @hanikh did: setting target layer as roi_heads.box_pooler
However, the result doesn't look right.
May I ask if you guys have solved it? Much appreciated. Thank you
@hanikh @jacobgil I am also trying to generate CAM's for detectron2 . But I am using a config file faster_rcnn_R_101_FPN_3x.yaml, when I try to input the image to ScoreCam or Eigen CAM function, these require input as a tensor grayscale_cam = cam(input_tensor, targets=targets) ,whereas detectron2 model requires list of dictionary as an input due to which I get an error in batched inputs of detectron2 as :
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-19-f8134a721b63> in <module>
6 reshape_transform=fasterrcnn_reshape_transform)
7
----> 8 grayscale_cam = cam(input_tensor, targets=targets)
9 # Take the first image in the batch:
10 grayscale_cam = grayscale_cam[0, :]
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/pytorch_grad_cam/base_cam.py in __call__(self, input_tensor, targets, aug_smooth, eigen_smooth)
183
184 return self.forward(input_tensor,
--> 185 targets, eigen_smooth)
186
187 def __del__(self):
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/pytorch_grad_cam/base_cam.py in forward(self, input_tensor, targets, eigen_smooth)
72 requires_grad=True)
73
---> 74 outputs = self.activations_and_grads(input_tensor)
75 if targets is None:
76 target_categories = np.argmax(outputs.cpu().data.numpy(), axis=-1)
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/pytorch_grad_cam/activations_and_gradients.py in __call__(self, x)
40 self.gradients = []
41 self.activations = []
---> 42 return self.model(x)
43
44 def release(self):
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/detectron2/modeling/meta_arch/rcnn.py in forward(self, batched_inputs)
144 """
145 if not self.training:
--> 146 return self.inference(batched_inputs)
147
148 images = self.preprocess_image(batched_inputs)
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/detectron2/modeling/meta_arch/rcnn.py in inference(self, batched_inputs, detected_instances, do_postprocess)
197 assert not self.training
198
--> 199 images = self.preprocess_image(batched_inputs)
200 features = self.backbone(images.tensor)
201
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/detectron2/modeling/meta_arch/rcnn.py in preprocess_image(self, batched_inputs)
222 Normalize, pad and batch the input images.
223 """
--> 224 images = [x["image"].to(self.device) for x in batched_inputs]
225 images = [(x - self.pixel_mean) / self.pixel_std for x in images]
226 images = ImageList.from_tensors(images, self.backbone.size_divisibility)
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/detectron2/modeling/meta_arch/rcnn.py in <listcomp>(.0)
222 Normalize, pad and batch the input images.
223 """
--> 224 images = [x["image"].to(self.device) for x in batched_inputs]
225 images = [(x - self.pixel_mean) / self.pixel_std for x in images]
226 images = ImageList.from_tensors(images, self.backbone.size_divisibility)
IndexError: too many indices for tensor of dimension 2
Please help how to solve this issue.
I have been using FPN network structure recently, but I have been unable to properly visualize with grad-cam.If anyone knows how to write code, please let me know.Thans a lot.