possible regession in normalization of gradients

ju-w commented 5 years ago

[x] Check that you are up-to-date with the master branch of keras-vis. You can update with: pip install git+git://github.com/raghakot/keras-vis.git --upgrade --no-deps
[x] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
[ ] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with: pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[ ] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Hello! In https://github.com/raghakot/keras-vis/commit/40b27dfa3ecb84cdde5ec6b44251923c3266cc40 this

https://github.com/raghakot/keras-vis/blob/14f3f61b2a07fa07164b2bb8ac1a6e5aed4936a6/vis/optimizer.py#L54-L55

was changed to this.

https://github.com/raghakot/keras-vis/blob/668b0e11dab93f3487f23c17e07f40554a8939e9/vis/optimizer.py#L62-L63

However those two expressions are not equivalent, the output they give is very different in results, possibly due to an additional an additional axis= parameter in l2_normalize that defaults to None. See https://github.com/petewarden/tensorflow_makefile/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/tf.nn.l2_normalize.md

I will post examples to show the difference, but the current configuration gives slow convergence and high loss when using ActivationMaximization(Loss) only.

ju-w commented 5 years ago

VGG 19 - block3_conv4 - filter 211

grads = grads / (K.sqrt(K.mean(K.square(grads))) + K.epsilon())

Iteration: 1, named_losses: [('ActivationMax Loss', -3.853601)], overall loss: -3.8536009788513184
Iteration: 2, named_losses: [('ActivationMax Loss', -31.384222)], overall loss: -31.38422203063965
Iteration: 3, named_losses: [('ActivationMax Loss', -53.95486)], overall loss: -53.95486068725586
Iteration: 4, named_losses: [('ActivationMax Loss', -69.526985)], overall loss: -69.52698516845703
Iteration: 5, named_losses: [('ActivationMax Loss', -82.56883)], overall loss: -82.56883239746094
Iteration: 6, named_losses: [('ActivationMax Loss', -94.73487)], overall loss: -94.73487091064453
Iteration: 7, named_losses: [('ActivationMax Loss', -106.171295)], overall loss: -106.17129516601562
Iteration: 8, named_losses: [('ActivationMax Loss', -117.44212)], overall loss: -117.44212341308594
Iteration: 9, named_losses: [('ActivationMax Loss', -128.7226)], overall loss: -128.72259521484375
Iteration: 10, named_losses: [('ActivationMax Loss', -139.9744)], overall loss: -139.97439575195312

vis01

grads = K.l2_normalize(grads)

Iteration: 1, named_losses: [('ActivationMax Loss', -3.869104)], overall loss: -3.8691039085388184
Iteration: 2, named_losses: [('ActivationMax Loss', -4.0159173)], overall loss: -4.0159173011779785
Iteration: 3, named_losses: [('ActivationMax Loss', -4.1643486)], overall loss: -4.164348602294922
Iteration: 4, named_losses: [('ActivationMax Loss', -4.314634)], overall loss: -4.314633846282959
Iteration: 5, named_losses: [('ActivationMax Loss', -4.46636)], overall loss: -4.466360092163086
Iteration: 6, named_losses: [('ActivationMax Loss', -4.619669)], overall loss: -4.619668960571289
Iteration: 7, named_losses: [('ActivationMax Loss', -4.774126)], overall loss: -4.774126052856445
Iteration: 8, named_losses: [('ActivationMax Loss', -4.9301567)], overall loss: -4.930156707763672
Iteration: 9, named_losses: [('ActivationMax Loss', -5.087387)], overall loss: -5.0873870849609375
Iteration: 10, named_losses: [('ActivationMax Loss', -5.2458615)], overall loss: -5.245861530303955

vis02

keisen commented 5 years ago

@ju-w , Thank you for your report! As you said, it seems that the previous implementation better than currently it. It's same as Keras's example implementation.

https://github.com/keras-team/keras/blob/7bee9a18d83899ce6dfd50c4883afc678139f4ad/examples/conv_filter_visualization.py#L57-L59

And it might be the way to fix issue / #109 ( Unfortunately, #143 was not fixed.) . Could you contribute to fix this issue?

@raghakot , I belive we should choice your previous l2_normalize implementation. But if you know the case which the currently implementation is better, please let us know.

raghakot / keras-vis

possible regession in normalization of gradients #153

grads = grads / (K.sqrt(K.mean(K.square(grads))) + K.epsilon())

grads = K.l2_normalize(grads)