Inconsistent batching for DiscretizedIntegratedGradients attributions

🐛 Bug Report

Despite fixing batched attribution so that results are consistent with individual attribution (see #110), the method DiscretizedIntegratedGradients still produces different results when applied to a batch of examples.

🔬 How To Reproduce

Instantiate a AttributionModel with the discretized_integrated_gradients method.
Perform an attribution for a batch of examples
Perform an attribution for a single example present in the previous batch
Compare the attributions obtained in the two cases

Code sample

import inseq

model = inseq.load_model("Helsinki-NLP/opus-mt-en-de", "discretized_integrated_gradients")

out_multi = model.attribute(
    [
        "This aspect is very important",
        "Why does it work after the first?",
        "This thing smells",
        "Colorless green ideas sleep furiously"
    ],
    n_steps=20,
    return_convergence_delta=True,
)

out_single = model.attribute(
    [ "Why does it work after the first?" ],
    n_steps=20,
    return_convergence_delta=True,
)

assert out_single.attributions == out_multi[1].attributions # raises AssertionError

Environment

OS: 20.04
Python version: 3.8

📈 Expected behavior

Same as #110

📎 Additional context

The problem is most likely due to a faulty scaling of the gradients in the _attribute method of the DiscretizedIntegratedGradients class.

inseq-team / inseq

Inconsistent batching for DiscretizedIntegratedGradients attributions #113