jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
https://jacobgil.github.io/pytorch-gradcam-book
MIT License
10.53k stars 1.56k forks source link

the multi-input issue #341

Open zhouqunbing opened 2 years ago

zhouqunbing commented 2 years ago

hello,if i want to use two inputs to segment picture,how should i do to fix the code? eg: RGB:(B,3,H,W) Depth:(B,1,H,W) I use the RGB and depth as the input,finally i will get the segmentation map.so i wanna to see the CAM map of some layers in the net。

jacobgil commented 2 years ago

Hi, Some clarifying questions:

zhouqunbing commented 2 years ago

Hi, Some clarifying questions:

  • Do you want to get the CAM with respect to the RGB pixels, the Depth pixels, or both?
  • Is the input tensor actually B, 4, H, W (4=3+1 i.e they are concatenated), or is it something else more complex ?

thank you for your repaly. 1:I wanna get the CAM of both or the RGB pixel. 2:the input rgb and depth are feed into resnet sepreately,for example the rgb feed into resnet,and the depth feed into another resnet(two resnet are same),then i will operate in resnet ,finaly i will get one output,the dimision is (B,512,H//32,W//32).

jacobgil commented 2 years ago

I'm not sure what it means separately. If the depth is grayscale (one channel), how can it be fed into the same resnet? will it be duplicated to have 3 channels?

The details of this matter for the solution. Is there a chance you can provide some code or pseudo code on how the forward pass looks like ?

zhouqunbing commented 2 years ago

I'm not sure what it means separately. If the depth is grayscale (one channel), how can it be fed into the same resnet? will it be duplicated to have 3 channels?

The details of this matter for the solution. Is there a chance you can provide some code or pseudo code on how the forward pass looks like ?

1:the depth is a gary picture ,it's pixel is range from 0 to 255,but the value of each pixel satnds for the distance between camera and object. 2:the strcture may be like this: image 3:the forward function is like this:

def forward(self, rgb, depth):
    rgb = self.encoder_rgb.forward_first_conv(rgb)
    depth = self.encoder_depth.forward_first_conv(depth)

    if self.fuse_depth_in_rgb_encoder == 'add':
        fuse = rgb + depth
    else:
        fuse = self.se_layer0(rgb, depth)

    rgb = F.max_pool2d(fuse, kernel_size=3, stride=2, padding=1)
    depth = F.max_pool2d(depth, kernel_size=3, stride=2, padding=1)

    # block 1
    rgb = self.encoder_rgb.forward_layer1(rgb)
    depth = self.encoder_depth.forward_layer1(depth)
    if self.fuse_depth_in_rgb_encoder == 'add':
        fuse = rgb + depth
    else:
        fuse = self.se_layer1(rgb, depth)
    skip1 = self.skip_layer1(fuse)

    # block 2
    rgb = self.encoder_rgb.forward_layer2(fuse)
    depth = self.encoder_depth.forward_layer2(depth)
    if self.fuse_depth_in_rgb_encoder == 'add':
        fuse = rgb + depth
    else:
        fuse = self.se_layer2(rgb, depth)
    skip2 = self.skip_layer2(fuse)

    # block 3
    rgb = self.encoder_rgb.forward_layer3(fuse)
    depth = self.encoder_depth.forward_layer3(depth)
    if self.fuse_depth_in_rgb_encoder == 'add':
        fuse = rgb + depth
    else:
        fuse = self.se_layer3(rgb, depth)
    skip3 = self.skip_layer3(fuse)

    # block 4
    rgb = self.encoder_rgb.forward_layer4(fuse)
    depth = self.encoder_depth.forward_layer4(depth)
    if self.fuse_depth_in_rgb_encoder == 'add':
        fuse = rgb + depth
    else:
        fuse = self.se_layer4(rgb, depth)
jacobgil commented 2 years ago

Got it ! If self.fuse_depth_in_rgb_encoder is not 'add', I think the easiest way to start, would be to set target_layer = se_layer4. It would then visualize the combined heatmap for both of them.

If you want to visualize only RGB for example, you can set target_layer = encoder_rgb.forward_layer4.

Does this work ?

zhouqunbing commented 2 years ago

@jacobgil thank you for your reply。 Practice is the sole criterion for testing truth。 I will revise the code 。

IceHowe commented 1 year ago

Hello, do you solve this problem? I also need to visualize the heat map of the siamese network

chen-yuu commented 10 months ago

Hello, do you solve this problem? I also need to visualize the heat map of the siamese network

hello,if i want to use two inputs to segment picture,how should i do to fix the code? eg: RGB:(B,3,H,W) Depth:(B,1,H,W) I use the RGB and depth as the input,finally i will get the segmentation map.so i wanna to see the CAM map of some layers in the net。

请问问题解决了吗 两个输入是怎么进行热力图计算呢

zhouqunbing commented 10 months ago

对不起,没有解决,后来没用到了。

---Original--- From: @.> Date: Wed, Dec 20, 2023 17:54 PM To: @.>; Cc: @.**@.>; Subject: Re: [jacobgil/pytorch-grad-cam] the multi-input issue (Issue #341)

Hello, do you solve this problem? I also need to visualize the heat map of the siamese network

hello,if i want to use two inputs to segment picture,how should i do to fix the code? eg: RGB:(B,3,H,W) Depth:(B,1,H,W) I use the RGB and depth as the input,finally i will get the segmentation map.so i wanna to see the CAM map of some layers in the net。

请问问题解决了吗 两个输入是怎么进行热力图计算呢

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>