inseq-team / inseq

Interpretability for sequence generation models πŸ› πŸ”
https://inseq.org
Apache License 2.0
343 stars 37 forks source link

Inconsistent batching for DiscretizedIntegratedGradients attributions #113

Open gsarti opened 2 years ago

gsarti commented 2 years ago

πŸ› Bug Report

Despite fixing batched attribution so that results are consistent with individual attribution (see #110), the method DiscretizedIntegratedGradients still produces different results when applied to a batch of examples.

πŸ”¬ How To Reproduce

  1. Instantiate a AttributionModel with the discretized_integrated_gradients method.
  2. Perform an attribution for a batch of examples
  3. Perform an attribution for a single example present in the previous batch
  4. Compare the attributions obtained in the two cases

Code sample

import inseq

model = inseq.load_model("Helsinki-NLP/opus-mt-en-de", "discretized_integrated_gradients")

out_multi = model.attribute(
    [
        "This aspect is very important",
        "Why does it work after the first?",
        "This thing smells",
        "Colorless green ideas sleep furiously"
    ],
    n_steps=20,
    return_convergence_delta=True,
)

out_single = model.attribute(
    [ "Why does it work after the first?" ],
    n_steps=20,
    return_convergence_delta=True,
)

assert out_single.attributions == out_multi[1].attributions # raises AssertionError

Environment

πŸ“ˆ Expected behavior

Same as #110

πŸ“Ž Additional context

The problem is most likely due to a faulty scaling of the gradients in the _attribute method of the DiscretizedIntegratedGradients class.

gsarti commented 1 year ago

Hi @soumyasanyal, FYI our library supports your method Discretized IG for feature attribution, but at the moment we are experiencing some issues with consistency across single-example and batched attribution (i.e. there is some issue with the creation of orthogonal approximation steps for a batch, see also #114 for additional info). It would be great if you could have a look!