Closed sjchoi86 closed 6 years ago
Thank you for pointing out this bug.
I read the paper again, and I found out that we should use y_c
instead of using cost = tf.reduce_sum((prob - labels) ** 2)
.
In this paper, y_c
is defined as logit of class c before softmax for classification task. I think your implementation is correct including filtering out by element-wise multiplication with y
(which is ground-truth label. I'd avoid variable name y
because it confuse reader between predicted value and ground-truth value).
Just be careful not to reduce batch dimension when you use batch inference.
I also found out gradient normalization is not necessary (not mentioned on the grad-cam paper). The normalization process was added when I wrote down this code while I was referring Keras implementation of GradCAM.
I also found out bug in utils.py
. When we make weighted sum of feature map, we should use zero-filled array for initial value, not one-filled array.
For more information about new update see this pull request
Any more correction or improvement pull request is welcome :)
First of all, I really appreciate your implementations. It helped me a lot with starting the grad-cam implementation.
I have a question regarding the cost function. I guess the cost function for computing the gradient should be changed from
prob = end_points['predictions'] # after softmax cost = tf.reduce_sum((prob - labels) ** 2)
to
y = tf.placeholder(tf.float32, [1, 1000]) ... logit = end_points['resnet_v1_50/logits'] # before softmax cost = tf.reduce_sum((logit * y))
Please let me know if I am misunderstanding the equation. Thanks.