utkuozbulak / pytorch-cnn-visualizations

Pytorch implementation of convolutional neural network visualization techniques
MIT License
7.78k stars 1.49k forks source link

Guided Backprop implementation #59

Closed Etienne-Meunier closed 4 years ago

Etienne-Meunier commented 4 years ago

Hi, Thank you very much for this awesome repo! In Guided Backprop implementation ( in guided_backprop.py ) I don't see why it is necessary to block the gradients where the neuron didn't activate ( the part using forward Relu inputs ) because the derivative of this output of this neuron in regard to the input will be 0 ( saturation area ) so the gradient will be blocked anyway. I tried a new version of the code by replacing the hooks and just using a backward hook that blocks negative gradients :

def relu_backward_hook_function_bis(module, grad_in, grad_out):
           """
           If there is a negative gradient, change it to zero
           """
           # Get last forward output
           modified_grad_out = torch.clamp(grad_in[0], min=0.0)
           return (modified_grad_out,)

and I have exactly the same results as in the original implementation for all the examples provided in the repo. Maybe I missed something that could change the results on other cases/networks but I have trouble to see what so I wanted to have your mind on it, Thanks

utkuozbulak commented 4 years ago

You can read the relevant discussion here: #36, in the 'edit' part of my last comment.

Etienne-Meunier commented 4 years ago

Thank you very much for your answer, however, the backpropagation on Relu seems to enforce the constraint by itself, like if we check in Fig. 1 of the article the Backward pass ( upper right ) seems to give 0 at the points where the forward pass outputs where negatives. Even in the bottom left we can see that the guided backpropagation seems to combine the backpropagation and the deconvnet without other condition. In the comment in #36 you seem to talk about a network where there is a difference, do you have an implementation where we can observe that it would really help me to understand, Thanks a lot

utkuozbulak commented 4 years ago

I remember finding a weird architecture where it would be different but cant recall which one it was at the moment (maybe the architecture in Network in Network paper that contains mlpconv layers).