jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
https://jacobgil.github.io/pytorch-gradcam-book
MIT License
10.06k stars 1.52k forks source link

Per Class Activations #289

Closed mcever closed 2 years ago

mcever commented 2 years ago

Hi, thanks for the implementations of so many different CAM methods. I am interested in using the repo to generate class specific CAMs. That is, I'd like to view the activation maps of each individual method. I am hoping to use this for weakly supervised object detection, so ideally I think I'd use Ablation CAM. Any guidance would be much appreciated.

Thanks, Austin

jacobgil commented 2 years ago

Hi,

You can apply a CAM on an image, there are examples in the README, or you can just run cam.py on an image.

To get a bounding box, you could, for example, threshold the grayscale_cam image to binarize it, and then find connected components.

I'm actually planning on adding examples for weakly supervised detections in the near futures, but that will probably take a few weeks from now.

mcever commented 2 years ago

Hi Jacob,

Thanks for the reply. From what I can tell, cam.py and the underlying BaseCAM class and its children classes simply generate one map, which I currently think of as more of an objectness map because, from what I can tell, there is no class discrimination. When I think of CAMs, I think of multiple per-class maps such that, for each class of interest, one can visualize where each individual class is activated (per class activations). Here is an example from Zhou et al Learning Deep Features for Discriminative Localization

cam_example

Maybe what's missing here is that I would need to retrain the CNNs with an additional GAP layer? Or do the newer AblationCAM and EigenCAM not discriminate classes? Or have I just missed where these per class maps are generated in the code?

Thanks for any guidance on where my confusion might be.

jacobgil commented 2 years ago

Hi, All of the methods implemented here (except Eigencam) have class discrimination. One of the parameters passed to the CAM object is the targets - what class do we want to create the visualizations for. The result for targets = [ClassifierOutputTarget(281)] and targets = [ClassifierOutputTarget(300)]

Will be different.