inseq-team / inseq

Interpretability for sequence generation models 🐛 🔍
https://inseq.org
Apache License 2.0
343 stars 37 forks source link

[Summary] Add gradient-based attribution methods #122

Open gsarti opened 2 years ago

gsarti commented 2 years ago

🚀 Feature Request

The following is a non-exhaustive list of gradient-based feature attribution methods that could be added to the library:

Method name Source In Captum Code implementation Status
DeepLiftSHAP - pytorch/captum
GradientSHAP1 Lundberg and Lee '17 pytorch/captum
Guided Backprop Springenberg et al. '15 pytorch/captum
LRP 2 Bach et al. '15 pytorch/captum
Guided Integrated Gradients Kapishnikov et al. '21 PAIR-code/saliency
Projected Gradient Descent (PGD) 3 Madry et al. '18, Yin et al. '22 uclanlp/NLP-Interpretation-Faithfulness
Sequential Integrated Gradients Enguehard '23 josephenguehard/time_interpret
Greedy PIG 4 Axiotis et al. '23
AttnLRP Achtibat et al. '24 rachtibat/LRP-for-Transformers

Notes:

  1. The Deconvolution method can also be added, but it seems to perform the same procedure as Guided Backprop, so it wasn't included to avoid deduplication.

  1. The method was already present in inseq but was removed due to instability in the single example vs. batched setting, reintroducing it will need this problem to be fixed.
  2. Custom rules for the supported architectures need to be defined in order to adapt the LRP attribution method to our use-case. An existing implementation of LRP rules for Transformer models in Tensorflow is available here: [lena-voita/the-story-of-heads](https://github.com/lena-voita/the-story-of-heads).
  3. The method leverage gradient information to perform adversarial replacement, so its collocation in the gradient-based family should be reviewed.
  4. Similar to Sequential Integrated Gradient, but instead of focusing on one word at a time, at every iteration the top features identified by attribution are fixed (i.e. baseline is set to identity) and the remaining ones are attributed again in the next round.
saxenarohit commented 8 months ago

Is there a plan to add LRP to inseq?

gsarti commented 7 months ago

Hi @saxenarohit, in principle the Captum LRP implementation should be directly compatible with Inseq. However, the implementation is very model specific with some notable (and to my knowledge, presently unsolved) issues with skip connections, which are the bread and butter of most transformer architectures used in Inseq (see pytorch/captum#546).

I think in general to proceed with an integration we should make sure that:

  1. at least the majority of Inseq-supported models would be compatible with propagation rules currently supported by Captum; and
  2. the misuse of LRP for unsupported architectures would raise informative errors to prevent involuntary misuse