Closed AvantiShri closed 6 years ago
Hi Avanti! Thanks for reaching out. Sure you can put this in your FAQ section. I would also mention that if it is used with multiplicative units (ie. gates) it might lead to unexpected results and the property of summation-to-delta does not hold anymore. Adapting to LSTMs or other architectures that use gates would require to make some different choices in the backprogation procedure instead of blindly apply the chain rule. For example as Arras et al. did for LRP in https://arxiv.org/abs/1612.07843
Thanks - yes I will mention that.
Hello,
Thank you for this wonderful implementation. I'm the author of DeepLIFT, and I wanted to augment DeepLIFT's FAQ section to compare the implementation there with the implementation in this repo. This is what I have - can you give your approval or suggest edits?
Ancona et al., authors of the DeepExplain repository, leveraged overriding of the gradient operators in Tensorflow to implement the Rescale rule of DeepLIFT. Their implementation can work with a wider variety of architectures than the DeepLIFT implementation in this repository, and is potentially more computationally efficient, but it does not have the advantages of the RevealCancel rule (which deals with failure modes such as the min function). Note that their implementation can work with architectures that DeepLIFT was not designed for, such as LSTMs and GRUs (their implementation would use the standard gradient backpropagation rule in all cases where the gradient operator was not overridden). We have not studied the appropriateness of this approach, but the authors did find that “Integrated Gradients and DeepLIFT have very high correlation, suggesting that the latter is a good (and faster) approximation of the former in practice”.