marcoancona / DeepExplain

A unified framework of perturbation and gradient-based attribution methods for Deep Neural Networks interpretability. DeepExplain also includes support for Shapley Values sampling. (ICLR 2018)
https://arxiv.org/abs/1711.06104
MIT License
731 stars 133 forks source link

Can you give your approval for this description of DeepLIFT as implemented in the DeepExplain repo? #14

Closed AvantiShri closed 6 years ago

AvantiShri commented 6 years ago

Hello,

Thank you for this wonderful implementation. I'm the author of DeepLIFT, and I wanted to augment DeepLIFT's FAQ section to compare the implementation there with the implementation in this repo. This is what I have - can you give your approval or suggest edits?

Ancona et al., authors of the DeepExplain repository, leveraged overriding of the gradient operators in Tensorflow to implement the Rescale rule of DeepLIFT. Their implementation can work with a wider variety of architectures than the DeepLIFT implementation in this repository, and is potentially more computationally efficient, but it does not have the advantages of the RevealCancel rule (which deals with failure modes such as the min function). Note that their implementation can work with architectures that DeepLIFT was not designed for, such as LSTMs and GRUs (their implementation would use the standard gradient backpropagation rule in all cases where the gradient operator was not overridden). We have not studied the appropriateness of this approach, but the authors did find that “Integrated Gradients and DeepLIFT have very high correlation, suggesting that the latter is a good (and faster) approximation of the former in practice”.

marcoancona commented 6 years ago

Hi Avanti! Thanks for reaching out. Sure you can put this in your FAQ section. I would also mention that if it is used with multiplicative units (ie. gates) it might lead to unexpected results and the property of summation-to-delta does not hold anymore. Adapting to LSTMs or other architectures that use gates would require to make some different choices in the backprogation procedure instead of blindly apply the chain rule. For example as Arras et al. did for LRP in https://arxiv.org/abs/1612.07843

AvantiShri commented 6 years ago

Thanks - yes I will mention that.