Guided backprop on non-RELU activations?

shijianjian commented 4 years ago

Hi,

It is very useful repo to learn stuff here. I am now wondering how to use guided backprop a network without RELU?

Say EfficientNet which used swish other than RELU activations. What should I do to properly visualize using the guided backprop?

utkuozbulak commented 4 years ago

Hey,

To be honest with you, I did not see a paper where the applicability of guided backprop on other activation functions are discussed in detail. I think you can use the implementation as it is, just change the targeted activation function from ReLU to whatever you are using (line 60). The filters coded in forward and backward hook should work without any issue.

shijianjian commented 4 years ago

I understand that there is not many resources on this topic. Here, my situation is, my EfficientNet-B3 does have negative gradients since swish is used than RELU. Thus, I am not confident at what I am generating is properly "guided".

In your code, you did

 corresponding_forward_output[corresponding_forward_output > 0] = 1
 modified_grad_out = corresponding_forward_output * torch.clamp(grad_in[0], min=0.0)

To my understanding, what is doing here aligns with RELU activation, which is used to cherry-pick the proper gradients. While the swish function allows the existence of the negative values. Is it still suitable to do so?

I am confused about how should I change the backward hook to make swish activation as "guided"?

swish = (1 / (1 + np.exp(x))) * x

utkuozbulak commented 4 years ago

Yep, you got it totally right. I think an implementation true to the proposed paper would be also cherry picking gradients that align with positive outputs at forward pass (see Figure 1 in https://arxiv.org/pdf/1412.6806.pdf) but like I said, its tricky. You might as well try both approaches and see what works best in your case.

utkuozbulak / pytorch-cnn-visualizations

Guided backprop on non-RELU activations? #78