inseq-team / inseq

Interpretability for sequence generation models 🐛 🔍
https://inseq.org
Apache License 2.0
343 stars 37 forks source link

[Summary] Add perturbation feature attribution methods #107

Open gsarti opened 2 years ago

gsarti commented 2 years ago

🚀 Feature Request

The following is a non-exhaustive list of perturbation-based feature attribution methods that could be added to the library:

Method name Source In Captum Code implementation Status
(Layer) Feature Ablation1 - pytorch/captum
Occlusion Zeiler and Fergus '13 pytorch/captum
Shapley Value Sampling Castro et al. '09 pytorch/captum
Lime Ribeiro et al. '16 pytorch/captum
KernelShap Lundberg and Lee '17 pytorch/captum
Editing 2 - - -
Greedy Rationalization 3 Vafa et al. '21 - keyonvafa/sequential-rationales
Information Bottleneck Jiang et al. '20 - DFKI-NLP/thermostat
BayesLime Slack et al. '21 - dylan-slack/Modeling-Uncertainty-Local-Explainability
BayesSHAP Slack et al. '21 - dylan-slack/Modeling-Uncertainty-Local-Explainability
Input Reduction Feng et al. '18 - -
Input Marginalization Kim et al. '20 - -
Occlusion & Language Modeling Harbecke and Alt '20 - DFKI-NLP/OLM
Context Probing 4 Cífka and Liutkus '22 - cifkao/context-probing
Weighted SHAP Kwon and Zou '22 - ykwon0407/WeightedSHAP
Value Zeroing Mohebbi et al. '23 - hmohebbi/ValueZeroing #173
Comprehensiveness-as-a-metric Zhou et al. '23 - YilunZhou/solvability-explainer
Sufficiency-as-a-metric Zhou et al. '23 - YilunZhou/solvability-explainer
Causal Tracing Meng et al. '22 - kmeng01/rome
Attention Knockout5 Geva et al. '23 - -
ReAGent Zhao et al. '24 - casszhao/ReAGent #250
SyntaxSHAP Amara et al. '24 - k-amara/syntax-shap

Notes:

  1. For more information on Editing, see point 3 in #112 .

  1. Called ablation, but perform masking of features using a baseline.
  2. Editing replaces tokens with their nearest neighbors in the vocabulary embedding space and measures saliency as the drop in performance for the target. In the future, this can allow users to specify a custom editing strategy via an input Callable.
  3. Possibly overlapping with feature ablation up to some measure.
  4. Valid only for decoder-only models.
  5. Verify whether it would be exactly equivalent to Value Zeroing, include only if functionally different (alias otherwise).
nfelnlp commented 1 year ago

More methods related to Occlusion:

gsarti commented 1 year ago

Added to method table!