jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
https://jacobgil.github.io/pytorch-gradcam-book
MIT License
9.79k stars 1.52k forks source link

Support for 3D Conv-Net #466

Closed kevinkevin556 closed 1 month ago

kevinkevin556 commented 7 months ago

Hi all,

Thank you for developing such a nice repo. I've been using it in many of my projects for network explainability, and it has been incredibly convenient!

Recently, I've been working with medical datasets using 3D-UNet. However, I noticed that 3D convolution is not yet supported in this library, and there are also some issues like #351 requesting for the feature. Therefore, I made several changes on GradCAM and BaseCAM to extend the functionality of GradCAM to support 3D images.

Please let me know if you have any questions or suggestions regarding the changes I've implemented. I'm excited to contribute to this project and look forward to your feedback!

jacobgil commented 6 months ago

Hey, sorry for the late reply. Thanks a lot for this functionality, this will be great to merge.

Is there a way to share an example use case for this: maybe some model and and input image example, or an image example for the readme?

kevinkevin556 commented 6 months ago

@jacobgil Thanks for your reply!

Is there a way to share an example use case for this: maybe some model and and input image example, or an image example for the readme?

I added an animation of gradcam-visualized CT scans in the readme. Hope this can make it clearer.

Syax19 commented 5 months ago

@kevinkevin556 Thanks for providing the code for applying Grad-Cam on 3D CNN!

I have used your code to get the grad-cam outputs, my input 3D image tensor size is (1, 1, 24, 224, 224) representing (batch, channel, depth, height, width). Then I got the grayscale_cam outputs size is (1, 24, 224, 224). I'm curious to know if I take one of the outputs, for example, depth=11, the output will be "outputs[ 0, : ][ 11, : , : ] (depth, height, width)", will it corresponds to "input image[ : , 11 , : , : ] (channel, depth, height, width)" ? Since I found that every depth of the output heatmap looked same.

Looking forward to your replying, thanks!

kevinkevin556 commented 5 months ago

I have used your code to get the grad-cam outputs, my input 3D image tensor size is (1, 1, 24, 224, 224) representing (batch, channel, depth, height, width). Then I got the grayscale_cam outputs size is (1, 24, 224, 224). I'm curious to know if I take one of the outputs, for example, depth=11, the output will be "outputs[ 0, : ][ 11, : , : ] (depth, height, width)", will it corresponds to "input image[ : , 11 , : , : ] (channel, depth, height, width)" ? Since I found that every depth of the output heatmap looked same.

@Syax19 Sorry for the late reply. I'm glad to hear that someone is using it 😄

Although I followed MONAI's convention to assign each dimension in the order of (height, width, depth), the output dimensions should still correspond with your input tensor, as there is no dimension swap when calculating Grad-CAM.

Therefore, the grayscale_cam of size (1, 24, 224, 224) represents dimensions (batch, depth, height, width) in your case.

MoH-assan commented 3 months ago

@jacobgil Any update on this feature?

jacobgil commented 1 month ago

This is incredible functionality, thank you so much for contributing this, and sorry for being so late with my reply. I really want to merge this. The .gif file weights 24 mb which is a bit much, will look into resizing it.

jacobgil commented 1 month ago

@kevinkevin556 merged!! better late than never. Thank you so much for this contribution!