jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
https://jacobgil.github.io/pytorch-gradcam-book
MIT License
10.06k stars 1.52k forks source link

SSL Grad-cam #221

Closed jaiswati closed 2 years ago

jaiswati commented 2 years ago

Hello @jacobgil

Thanks a lot for these awesome repositories :) is it possible to access the ViT backbone of pretrained DINO network and then explain it with grad-cam methods? Probably accessing specific layers etc. kindly help with how exactly it could be done? Best Regards, @jaiswati

jacobgil commented 2 years ago

First - for DINO you can just use the self attention from the last layer, and visualize it as a 2D image.

Another option, using this repo, since you don't have categories, would be to use the "EigenCAM" method. This will find salient objects in the feature representations.

And yet another option - A few minutes ago I added a notebook tutorial for visualizing concept embeddings in images: https://github.com/jacobgil/pytorch-grad-cam/blob/master/tutorials/Pixel%20Attribution%20for%20embeddings.ipynb

It works on models that output embedding feature vectors (like DINO or other SSL models), and searches for concept embeddings in the image. In SSL you would have to define these concepts - they could be samples of the training images, for example.