[Summary] Add gradient-based attribution methods

gsarti commented 2 years ago

🚀 Feature Request

The following is a non-exhaustive list of gradient-based feature attribution methods that could be added to the library:

Method name	Source	In Captum	Code implementation	Status
DeepLiftSHAP	-	✅	`pytorch/captum`
GradientSHAP¹	Lundberg and Lee '17	✅	`pytorch/captum`
Guided Backprop	Springenberg et al. '15	✅	`pytorch/captum`
LRP ²	Bach et al. '15	✅	`pytorch/captum`
Guided Integrated Gradients	Kapishnikov et al. '21		`PAIR-code/saliency`
Projected Gradient Descent (PGD) ³	Madry et al. '18, Yin et al. '22		`uclanlp/NLP-Interpretation-Faithfulness`
Sequential Integrated Gradients	Enguehard '23		`josephenguehard/time_interpret`
Greedy PIG ⁴	Axiotis et al. '23
AttnLRP	Achtibat et al. '24		`rachtibat/LRP-for-Transformers`

Notes:

The Deconvolution method can also be added, but it seems to perform the same procedure as Guided Backprop, so it wasn't included to avoid deduplication.

The method was already present in inseq but was removed due to instability in the single example vs. batched setting, reintroducing it will need this problem to be fixed.
Custom rules for the supported architectures need to be defined in order to adapt the LRP attribution method to our use-case. An existing implementation of LRP rules for Transformer models in Tensorflow is available here: [lena-voita/the-story-of-heads](https://github.com/lena-voita/the-story-of-heads).
The method leverage gradient information to perform adversarial replacement, so its collocation in the gradient-based family should be reviewed.
Similar to Sequential Integrated Gradient, but instead of focusing on one word at a time, at every iteration the top features identified by attribution are fixed (i.e. baseline is set to identity) and the remaining ones are attributed again in the next round.

saxenarohit commented 8 months ago

Is there a plan to add LRP to inseq?

gsarti commented 7 months ago

Hi @saxenarohit, in principle the Captum LRP implementation should be directly compatible with Inseq. However, the implementation is very model specific with some notable (and to my knowledge, presently unsolved) issues with skip connections, which are the bread and butter of most transformer architectures used in Inseq (see pytorch/captum#546).

I think in general to proceed with an integration we should make sure that:

at least the majority of Inseq-supported models would be compatible with propagation rules currently supported by Captum; and
the misuse of LRP for unsupported architectures would raise informative errors to prevent involuntary misuse

inseq-team / inseq

[Summary] Add gradient-based attribution methods #122

🚀 Feature Request