Incorrect Attribution for T5 Seq2Seq Model in Captum

mrektor commented 1 year ago

Issue

I am trying to use the Captum library to get attributions for my sequence-to-sequence T5 model. However the attributions are returned to the input token of the decoder rather than the input token of the encoder. This is not the desired outcome, as I would expect attributions for either both the encoder and the decoder, or just the encoder.

Minimal Reproduction Code

Here is the minimal code to reproduce the issue:

from captum.attr import LayerGradientXActivation
from transformers import pipeline
pipe = pipeline('text2text-generation',
                model='t5-base',
                tokenizer='t5-base',
                device='cuda')

input_ids = pipe.tokenizer("A simple example", return_tensors="pt", padding=True, truncation=True).input_ids.to('cuda')

embedding_layer = pipe.model.base_model.encoder.embed_tokens
inputs_emb = embedding_layer(input_ids)

def forward_from_embeddings(inputs):
    # Doing the first step of the forward pass manually
    decoder_input_ids = pipe.tokenizer(["<pad>"] * inputs.shape[0], add_special_tokens=False, truncation=True,
                                       return_tensors="pt").input_ids.to('cuda')
    logits = pipe.model.forward(inputs_embeds=inputs, decoder_input_ids=decoder_input_ids)['logits'][:, -1, :]
    return logits

lig = LayerGradientXActivation(forward_from_embeddings,
                               embedding_layer,
                               multiply_by_inputs=False)

attributes_token_embedding = lig.attribute(inputs=inputs_emb,
                                           target=10)
print(f'inputs_emb: {inputs_emb.shape}')
print(f'attributes_token_embedding: {attributes_token_embedding.shape}')

The code returns:

inputs_emb: torch.Size([1, **4**, 768]) attributes_token_embedding: torch.Size([1, **1**, 768])

As you can see, the second dimension of the inputs_emb and attributes_token_embedding tensors are different. The desired outcome would be:

attributes_token_embedding: torch.Size([1, **_4_**, 768])

Request for Help

I would appreciate any guidance on how to correctly obtain attributions for T5 models in Captum. Thank you for your assistance in resolving this issue.

michelecafagna26 commented 1 year ago

I'm also interested, I've the same issue

gsarti commented 1 year ago

Here is a functioning version @mrektor @michelecafagna26

from captum.attr import InputXGradient
from transformers import pipeline
pipe = pipeline('text2text-generation',
                model='google/flan-t5-base',
                tokenizer='google/flan-t5-base',
                device='cuda')

input_ids = pipe.tokenizer(["A simple example"], return_tensors="pt", padding=True, truncation=True).input_ids.to('cuda')

embedding_layer = pipe.model.base_model.encoder.embed_tokens
inputs_emb = embedding_layer(input_ids)

decoder_input_ids = pipe.tokenizer(["<pad>"] * input_ids.shape[0], return_tensors="pt", add_special_tokens=False, truncation=True).input_ids.to('cuda')

decoder_embedding_layer = pipe.model.base_model.decoder.embed_tokens
decoder_inputs_emb = decoder_embedding_layer(decoder_input_ids)

def forward_from_embeddings(inputs_embeds, decoder_inputs_embeds):
    logits = pipe.model.forward(inputs_embeds=inputs_embeds, decoder_inputs_embeds=decoder_inputs_embeds)['logits'][:, -1, :]
    return logits

lig = InputXGradient(forward_from_embeddings)

attributes_token_embedding = lig.attribute(
    inputs=(inputs_emb, decoder_inputs_emb),
    target=10
)
print(f'inputs_emb: {inputs_emb.shape}')
print(f'attributes_token_embedding: {[e.shape for e in attributes_token_embedding]}')

The code returns:

inputs_emb: torch.Size([1, 4, 768]) attributes_token_embedding: (torch.Size([1, 4, 768]), torch.Size([1, 1, 768]))

You might notice that forward_from_embeddings now takes two parameters, but we're actually passing a tuple of tensors to it. This is because behind the scenes Captum matches tensors passed as inputs to the params of the attributed function that will need to be attributed.

If you don't want the sharp edges of Captum when attributing generative models from 🤗 Transformers, I suggest you to try our Inseq library. Here's an example to obtain the same results:

import inseq

model = inseq.load_model("google/flan-t5-base", "input_x_gradient")
# Attribute source and target prefix at every generation step
out = model.attribute("A simple example", attribute_target=True)
out

>>> FeatureAttributionOutput({
>>>     sequence_attributions: list with 1 elements of type GranularFeatureAttributionSequenceOutput:[
>>>         GranularFeatureAttributionSequenceOutput({
>>>             source: list with 4 elements of type TokenWithId:[
>>>                 '▁A', '▁simple', '▁example', '</s>'
>>>             ],
>>>             target: list with 18 elements of type TokenWithId:[
>>>                 '▁A', '▁', 's', 'and', '▁castle', '▁is', '▁', 'a', '▁place', '▁where', '▁you', '▁can', '▁build', '▁', 'a', '▁castle', '.', '</s>'
>>>             ],
>>>             source_attributions: torch.float32 tensor of shape [4, 18, 768] on cpu,
>>>             target_attributions: torch.float32 tensor of shape [18, 18, 768] on cpu,
>>>             step_scores: {},
>>>             sequence_scores: {},
>>>             attr_pos_start: 0,
>>>             attr_pos_end: 18,
>>>         })
>>>     ],
>>>     ...

Hope it helps!

mrektor commented 1 year ago

Thanks!! btw I didn't know about inseq, that library looks amazing!!! thanks for sharing

pytorch / captum