How to compute the gradient of the score for class?And what's the relationship between grident and visulization?

zhangrong1722 commented 6 years ago

In grad-cam paper,the authors use the gradient of the score for class c namely y^c partial derivative of feature maps A^k of a convolutional layer to obtain the neuron importance weights.But how to compute the gradient of the score for class c y^c?And why?

ramprs commented 6 years ago

How to compute the gradient of the score for class c y^c?

In order to compute the gradient of the score for any class c wrt convolutional feature map, we set the gradient to the pre-softmax layer (fc8 in case of VGG) as a one-hot encoding of the target class c and back propagate that gradient till the last convolutional layer (in most frameworks this can be done by simply calling backward).

Why do we use Gradients in Grad-CAM?

Grad-CAM uses the fact later convolutional layers capture more semantic information, and the importance of a neuron can be determined by summing up (Global average pooling) the gradients obtained above. In order to explain this particular image, these importance weights are combined with convolutional feature map A to obtain Grad-CAM.

Difference between visualization and gradient:

In Grad-CAM paper when we refer to visualization, we refer to the visual explanation (Grad CAM heat map or Guided Grad-CAM gradients). Gradients are much more general and are used in different scenarios - for training or generating explanations (like in the case of Grad-CAM).

zhangrong1722 commented 6 years ago

Thank you!

hnguyentt commented 4 years ago

Hello @ramprs

Is it still acceptable to do the backpropagation from softmax rather than pre-softmax till the last convolutional layer?

I tested backpropagation from the softmax layer and logits and got quite similar results.

ramprs / grad-cam

How to compute the gradient of the score for class?And what's the relationship between grident and visulization? #10