keisen / tf-keras-vis

Neural network visualization toolkit for tf.keras
https://keisen.github.io/tf-keras-vis-docs/
MIT License
311 stars 45 forks source link

Examples: ScoreCAM should not use the linear-activation model_modifier #47

Closed bersbersbers closed 3 years ago

bersbersbers commented 3 years ago

https://arxiv.org/pdf/1910.01279.pdf says quite explicitly:

The relative output value (postsoftmax) after normalization is more reasonable to measure the relevance than absolute output value (pre-softmax). Thus, in Score-CAM, we represent weight as post-softmax value, so that the score can be rescaled into a fixed range. [...] Score-CAM with softmax can well distinguish two different categories, even though the prediction probability of ‘cat’ is lower than the probability of ‘dog’. Normalization operation equips Score-CAM with good class discrimination ability

So I'd say in https://github.com/keisen/tf-keras-vis/blob/master/examples/attentions.ipynb, don't suggest use of model_modifier in [11] and [12].

See also algorithm 1, third line from the bottom, which may or may not imply another softmax step.

keisen commented 3 years ago

Hi, @bersbersbers . Thank you for pointing that out! , and sorry for late reply. As you said, the example has the problem, so I'm going to fix them by the end of this month.

Thanks!

keisen commented 3 years ago

Hi, @bersbersbers . I'm facing a unexpected trouble.

I've modified the example to be NOT using the linear-activation model_modifier. And then, I got the image below I didn't expect.

The heatmaps of Bear and Assault rifle are too noisy and not accurate. However, when I again run GradCAM with the linear-activation model_modifier, I got it below.

I'm NOT sure why it is exactly. It may be that there is any bugs in tf-keras-vis, and the examination can take a time. For now, I believe that , even if it is NOT the way correctly, ScoreCAM example should use the linear-activation model_modifier.

Thanks!

bersbersbers commented 3 years ago

First, thanks for trying this!

I have also observed that ScoreCAMs get worse without the linear activation model modifier. However, I had a reason to keep this solution (without the modifier) since I am also interested in ScoreCAMs of classes other than the predicted one (let's say you want to know what you might need to change in the goldfish image to have it classified as bear - compare Fig. 5 of the ScoreCAM paper). Maybe you want to look at these ScoreCAMs as well to understand if the linear activation modifier should be used.

In my case (VGG16, binary classification), I usually (with the modifier) had the ScoreCAM of the wrong class being completely empty due to all scores being negative and the ReLU zeroing the resulting map. With one image, I even faced that situation with the correct class (predicted correctly with a softmax of 0.6/0.4) - the ScoreCAM of the correct class would be completely empty.

I agree that this may be worth additional investigation, as there may be some other things at play. Remember also that there may be another softmax step that needs to be applied within the algorithm (not at the end of the network):

See also algorithm 1, third line from the bottom, which may or may not imply another softmax step.