marcoancona / DeepExplain

A unified framework of perturbation and gradient-based attribution methods for Deep Neural Networks interpretability. DeepExplain also includes support for Shapley Values sampling. (ICLR 2018)
https://arxiv.org/abs/1711.06104
MIT License
725 stars 133 forks source link

Incorrect Implementation of Gradient*Input and Integrated Gradients #9

Closed agakshat closed 6 years ago

agakshat commented 6 years ago

I believe the implementation of the gradient x input and integrated gradients methods (along with the interpretation of these methods in the accompanying paper) is incorrect. Taking example of only the gradient*input method, you say in Section 2.2 that

The attribution is computed taking the (signed) partial derivatives of the output with respect to the input and multiplying them with the input itself

and implement it by doing a regular backpropagation step on the network, computing the gradients wrt the input and then simply multiplying that with the input. However, as Shrikumar et al. (2016) defined it was that the gradient*input is a 'relevance score' for each layer, and instead of doing regular backpropagation, at each layer the relevance score (gradient) received at that layer has to be multiplied with the activation (input) of that layer and used as the relevance score which is backpropagated, and not the gradients themselves. Effectively, the gradient calculation process needs to be overridden, similar to how you do it in LRP and DeepLift implementations.

A similar argument holds for the Integrated Gradients case. Please correct me if I'm mistaken.

marcoancona commented 6 years ago

Hi! You are mistaken, indeed. About Integrated Gradients (IG), the definition in Eq. 1 in Sundararajan et al. (2017) is very clear: you multiply the original input by the integral of the gradient of the model function. About Gradient*Input, both Shrikumar et al. (2016) and Kindermans et al. (2016) state it is equivalent to e-LRP when epsilon is set to zero and ReLU is used. If you run the proof, you see that this happens if you multiply the gradient with respect to the input only at the input layer, hence the definition that we also use.

I don't see, in Shrikumar et al. (2016), evidence for your interpretation. Moreover, our implementation is consistent with the one provided by the authors of SmoothGrad (Smilkov et al. 2016, where the multiplication of gradient and input is also discussed): https://github.com/PAIR-code/saliency I will close this, but you can email me if you need more details.