utkuozbulak / pytorch-cnn-visualizations

Pytorch implementation of convolutional neural network visualization techniques
MIT License
7.82k stars 1.49k forks source link

Wrong implementation of Guided Backprop ? #36

Closed alexrakowski closed 5 years ago

alexrakowski commented 5 years ago

Isn't there a bug in the way negative gradients are filtered out for ReLU's in Guided Backprop?

What you do is: torch.clamp(grad_in[0], min=0.0),

while the paper states:

rather than masking out values corresponding to negative entries of the top gradient (’deconvnet’) or bottom data (backpropagation), we mask out the values for which at least one of these values is negative

It seems to me that you forgot to include the top gradient.

utkuozbulak commented 5 years ago

We call this method guided backpropagation, because it adds an additional guidance signal from the higher layers to usual backpropagation. This prevents backward flow of negative gradients, corresponding to the neurons which decrease the activation of the higher layer unit we aim to visualize.

Am I missing something?

alexrakowski commented 5 years ago

I think you should also be masking the gradients wrt. output (top) gradients, in a similar manner like here

utkuozbulak commented 5 years ago

Hey, you mean the gradients should also be filtered by non-zero forward output of the layers? Having re-read the paper (especially in Figure1), I get what you mean. In fact, the implemented approach (in the repository) looks closer to deconvolution (Zeiler&Fergus).

I will go over it during the weekend, there are few more things I want to change so I might as well implement them all together.

Thanks.

Edit: I was experimenting with a few things and for AlexNet and VGG the output does not change because positive output in the forward pass also corresponds to positive output in the backprop when ReLUs are updated. In the paper they are using another architecture from NiN which is built on Mlpconv layer which makes a difference. Nevertheless, I will update the code but results remain the same. (Ps. I wasn't actually convinced that this is the case before doing additional experiments, you can play around to convince yourself that in GBP paper Figure 1, b) 'Backward pass:deconvnet and Backward pass: guided backprop actually leads to the same output for regular conv. architectures like AlexNet and VGG.)

Here are the changes: https://github.com/utkuozbulak/pytorch-cnn-visualizations/commit/5d052f1043362e83b4934d56343f05c28894a807