GradCAM for Dual Attention ViT

jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

https://jacobgil.github.io/pytorch-gradcam-book

MIT License

9.79k stars 1.52k forks source link

GradCAM for Dual Attention ViT #474

Open tayyabapucit opened 6 months ago

tayyabapucit commented 6 months ago

How can I use Grad CAM for Dual Attention ViT Transformer (davit_tiny.msft_in1k). I tried to use reshape transform with height,width 7,7 and 14,14 I need to figure out which layer to use or how to reshape the output.

I've successfully used GradCAM, EigenCAM, and ScoreCAM with ResNet, DenseNets, VGG, ViT, and SwinVit. Results are amazing for all these models.

Any Help?

tayyabapucit commented 6 months ago

Ok, I've figured it out. It worked with target_layers = [list(model.backbone.modules())[-13]]

and

def reshape_transform2(tensor, height=7, width=7):

result = tensor.reshape(tensor.size(0),
    height, width, tensor.size(2))

# ??Bring the channels to the first dimension,
# like in CNNs.
result = result.transpose(2, 3).transpose(1, 2)

return result

Am I doint it right.?