Closed yangfantrinity closed 4 years ago
Hi @yangfantrinity, right now we don't have support for GradCAM++, but we will consider adding it in the future.
You should be able to modify the existing GradCAM implementation to include the changes proposed in GradCAM++. For example, when applying an exponential to the output logit, the proposed GradCAM++ weight coefficient is:
This can be incorporated into the GradCAM implementation by replacing line 204 in attr/_core/layer/grad_cam.py with something like:
squared_layer_gradients = tuple(
layer_grad ** 2 for layer_grad in layer_gradients
)
cubed_layer_gradients = tuple(layer_grad ** 3 for layer_grad in layer_gradients)
summed_acts = tuple(
torch.sum(
layer_eval,
dim=tuple(x for x in range(2, len(layer_eval.shape))),
keepdim=True,
)
for layer_eval in layer_evals
)
alphas = tuple(
squared_layer_gradient
/ ((2 * squared_layer_gradient) + (cubed_layer_gradient * summed_act))
for squared_layer_gradient, cubed_layer_gradient, summed_act in zip(
squared_layer_gradients, cubed_layer_gradients, summed_acts
)
)
# Replace NaNs with 0
for alpha in alphas:
alpha[alpha!=alpha] = 0
summed_grads = tuple(
torch.sum(
alpha * F.relu(layer_grad),
dim=tuple(x for x in range(2, len(layer_grad.shape))),
keepdim=True,
)
for alpha, layer_grad in zip(alphas, layer_gradients)
)
Note that this applies when taking gradients with respect to an exponential applied to the logit (softmax would be different as described in the paper). This proposed change hasn't been tested thoroughly and may have some issues, but should help get started with adapting the existing GradCAM implementation for GradCAM++.
For SmoothGrad, you should be able to use NoiseTunnel directly on top of GradCAM with the modification. Hope this helps!
Appreciated for your detailed reply and the codes. @vivekmig
If I may ask two more questions: 1) are we safe to assume exp(logit)? The reason here applying an exponential to the output logit is mainly for the ease of computation? 2) What should be the assumption for efficientnet? After the last convolution layer, there are norm, average, dropout layers:
Before I got your reply here, which is definitely a more efficient and robust way to implement GradCAM++, I did the following codes after Line 202:
gradients = layer_gradients[0]
activations = layer_evals[0]
b, k, u, v = gradients.size()
alpha_num = gradients.pow(2)
alpha_denom = alpha_num.mul(2) + activations.mul(gradients.pow(3)).view(b, k, u * v).sum(-1).view(b, k, 1, 1)
alpha_denom = torch.where(alpha_denom != 0.0, alpha_denom, torch.ones_like(alpha_denom))
alpha = alpha_num.div(alpha_denom + 1e-7)
positive_gradients = F.relu(logit.exp() * gradients) # ReLU(dY/dA) == ReLU(exp(S)*dS/dA))
weights = (alpha * positive_gradients).view(b, k, u * v).sum(-1).view(b, k, 1, 1)
undo_gradient_requirements(inputs, gradient_mask)
weights = tuple(weights)
and also replace Line 215 to Line 218 with:
scaled_acts = tuple(
torch.sum(weight * layer_eval, dim=1, keepdim=True)
for weight, layer_eval in zip(weights, layer_evals)
)
This is idea is borrowed from https://github.com/1Konny/gradcam_plus_plus-pytorch
I will implement the code you suggested here.
Hi @yangfantrinity , no problem!
Your modification looks good too, the main difference is we maintain grads and activations as tuples to support layers with multiple inputs / outputs, but in your case (and GradCAM / image classification generally), that's usually not the case.
Thank you @vivekmig , very nice to have discussion with you here. @NarineK Shall I close this question?
Thank you everyone! Closing this issue!
Does the package have some function already support GradCAM++ https://arxiv.org/pdf/1710.11063.pdf?
I know
NoiseTunnel
is the SmoothGrad