jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
https://jacobgil.github.io/pytorch-gradcam-book
MIT License
9.79k stars 1.52k forks source link

Grad CAM for multiple input arguments #501

Closed IshitaB28 closed 1 month ago

IshitaB28 commented 1 month ago

I am trying to use GradCam on my model that takes more than one input arguments. I tried to pass input_tensor instead of input_tensor since I have a list of 4 arguments. I was trying to modify the source code accordingly. I am facing a series of errors. I am at this point now: Traceback (most recent call last): File "/home/ishita-wicon/Documents/QA/ISIQA/UNET/expl_exp.py", line 449, in grayscale_cam = cam(input_tensor=[left_patches, right_patches, left_image_patches, right_image_patches], targets=targets) File "/home/ishita-wicon/.local/lib/python3.10/site-packages/pytorch_grad_cam/base_cam.py", line 192, in call return self.forward(input_tensor, File "/home/ishita-wicon/.local/lib/python3.10/site-packages/pytorch_grad_cam/base_cam.py", line 105, in forward cam_per_layer = self.compute_cam_per_layer(input_tensor, #changes to * for list TypeError: BaseCAM.compute_cam_per_layer() takes 4 positional arguments but 7 were given

Is there any other way?

Studentpengyu commented 1 month ago

Hi there, I am facing a similar issue. My model requires an input that consists of a list containing two tensors. How did you handle this? Could you share your solution?

IshitaB28 commented 1 month ago

Hi, so I handled it by combining the inputs into one tensor and then separating them out in the forward function of my model.

For example if you need to input 4 things into your model, you do:

inp = torch.cat((a, b, c, d), dim = 0) grayscale_cam = cam(input_tensor=inp, targets=None)

And then, in the forward function of your model:

def forward(self, inp): a, b, c, d = inp[0], inp[1], inp[2], inp[3]

before you proceed with further steps

Studentpengyu commented 1 month ago

Thank you for your prompt response. I will try it!

Studentpengyu commented 1 month ago

Hi, I have modified the input format by first combining the inputs into one tensor and then separating them out in the forward function of my model.

In my case, my two inputs are an image tensor and a text tensor. Since these two types have different sizes, direct concatenation is not possible. I flatten both and concatenate along the first dimension, then record the lengths and shapes for later separation. The models can work well in this way. However, there is still an issue.

Here is a snippet of my code:

inp = torch.cat((text_flat, image_flat), dim = 0) grayscale_cam = cam(input_tensor=inp, targets=None)

The grayscale_cam will be flattened too, and the size of it was Not what I expected. The size is the sum of the text and image tensors, while I expect it to be the size of the image tensor because I need to display grayscale_cam on the image.

Therefore, I extracted the image part, but the resulting image was completely incorrect. image

Studentpengyu commented 1 month ago

Hi there, I wanted to let you know that my issue has been resolved.

Here is my solution: I moved the text features as fixed features into the forward function. Since my text features are extracted using a large language model and do not need updating, this approach works well.

Thank you for your help!

IshitaB28 commented 1 month ago

Hello, good to know that its solved now. Thanks for sharing the issue and the solution!