raghakot / keras-vis

Neural network visualization toolkit for keras
https://raghakot.github.io/keras-vis
MIT License
2.97k stars 664 forks source link

Fix: Problem with keras ResNet50 CAM visualisation #53 #122

Closed keisen closed 5 years ago

keisen commented 6 years ago

Cause In the modify_model_backdrop method, It is setting new Relu function to all the ‘activation’ attribute. i.e., It's all of Convolution Layer and Activation Layer. In Keras's ResNet50, the activation attribute of each Conv layer is set to the Linear function and also has Activation layer. Replacing this Linear function with the Relu function seems to be the cause of the gradient loss. Actually, If there are not Activation Layer, this issue isn't occured.

Detail

Please refer to below comment ( https://github.com/raghakot/keras-vis/pull/122#issuecomment-406935791 ).

@raghakot , Since it is a bug fix, I didn’t add tests and documents. Instead of that, I added an example.

https://github.com/keisen/keras-vis/blob/bugfix/%2353/examples/resnet/attention.ipynb

( MODIFIED on 15 Aug 2018 )

keisen commented 6 years ago

@raghakot , Sorry, I forgot to fix the test code. I will fix it so please give me some time.

keisen commented 6 years ago

Hi, @raghakot

Now, I'm fixing the test code. I have a question regarding the following implementation.

https://github.com/raghakot/keras-vis/blob/e019cc43ce6c00b2151941b18dbad63164ad632a/vis/backend/tensorflow_backend.py#L87-L100

As a test, I removed this implementation (for-each). But GuidedBackprop is working fine. (There is no problem with ReNet 50 @ Keras)

What is this implementation necessary for? Is it important to use tf.nn.relu ?

Thanks.

ADD on 23 Jul 2018:

What is this implementation necessary for? Is it important to use tf.nn.relu ?

I understood about this. maybe. Because it corresponds to Activation other than ReLU, isn't it ?

I think that it should be used gradient_override_map. I wrote an example in the comment below.

keisen commented 6 years ago

I'm sorry. I misoperated.... ;-(

keisen commented 6 years ago

@raghakot, Please review this PR, and point out if there is a problem. Also please tell me if you have a good idea about the test.

Fixes

The problem can be avoided by modifying the following implementation.

https://github.com/raghakot/keras-vis/blob/e019cc43ce6c00b2151941b18dbad63164ad632a/vis/backend/tensorflow_backend.py#L87-L100

The problem is the condition of line 89. You can avoid the problem by excluding keras.activations.linear from the condition. (Change line 89 to if hasattr(layer, 'activation’) and layer.activation != keras.activations.linear: )

However, I deleted all of the above implementations in this PR. If you will support Activation other than ReLU, you should add Activation with gradient_override_map instead of replacing Activation with ReLU. (Ex: gradient_override_map({‘Relu’: backdrop_modifier}, {‘Elu’: backdrop_modifier}) ) In this PR, addition of Activation is not implemented.

Consultation about the test

I can not create a test to reproduce the problem because we have not identified the root cause of the problem.

If it change to not replace activation function, this problem can be avoided. So, it seems to be the cause that the ReLU function is executed twice consecutively.

However, ReLU(ReLU(x)) is just same as ReLU(x). I don't think that this will have an adverse effect on the results. Batch Normalization and Add merge layer also do not seem to cause this problem.

Do you have any good idea to make a better test ?

Regards

keisen commented 5 years ago

Hi,@raghakot.

I’ve thought about this PR, then I decided to merge it. The reason is following.

if it have any problem, please revert this PR.

p.s. If you have any idea about the test of this PR, please response for https://github.com/raghakot/keras-vis/pull/122#issuecomment-406935791 .