Closed durrantmm closed 4 years ago
Hi @durrantmm, I just saw this by chance. It sounds like you are using genomic sequences as your input since you are talking about shuffled sequences, in which case Marco wouldn’t be the right person to ask this question to (Marco’s DeepExplain provides a unified implementation of various interpretability algorithms, but he is not the point person for domain-specific usage of individual algorithms). The short answer to your question is yes, you have to supply the reference yourself, and if you are planning to use multiple reference per example then the DeepExplain implementation isn’t well suited to this; instead, you should either use the original DeepLIFT implementation , or (if the original DeepLIFT implementation does not work for your architecture), you should use the DeepSHAP implementation (I discuss this in an FAQ on the DeepLIFT repo - DeepSHAP is an extension of DeepLIFT: https://github.com/kundajelab/deeplift#what-are-the-similarities-and-differences-between-the-deeplift-like-implementations-in-deepexplain-from-ancona-et-al-iclr-2018-and-deepshapdeepexplainer-from-the-shap-repository).
For an example notebook that uses DeepSHAP and multiple shuffled sequences as the reference, you can refer to this: https://gist.github.com/AvantiShri/8a3a0a03f4c4a578ee7909e3989467cc
Hope this helps. Feel free to direct other domain-specific questions on DeepLIFT at me.
Thanks @AvantiShri for answering this in detail. Indeed DeepExplain only supports a fixed baseline (usually the zero or mean baseline). Notice that there is still an open discussion about which reference baseline should be used (e.g. https://arxiv.org/abs/1908.08474).
Great, thank you very much Avanti!
Hello, I would just like a more detailed description of the baseline parameter when using the deeplift model. Is this something that I need to generate myself using shuffled sequences as a reference? Do you provide functions for doing this?