Closed sajjadriaj closed 3 years ago
Hi @sajjadriaj , it seems that the issue here is that you are trying to compute attributions with respect to token indices, but we cannot actually compute gradients with respect to these indices, only to the corresponding embeddings. More information regarding this can be found in this FAQ answer.
To compute Integrated Gradients for tokens, you would need to attribute with respect to word embeddings rather than input indices, which can be done either by overriding the embedding layer or using a LayerAttribution method. Examples of these can be found in our BERT or IMDB tutorials here.
Hello @vivekmig Thank you so much for pointing me towards my mistake. I have followed the IMDB example, However I am getting the following error now:
AssertionError: Cannot choose target column with output shape torch.Size([4]).
Here is my updated code:
#%%
from transformers import RobertaTokenizer, RobertaConfig, RobertaForSequenceClassification, RobertaModel
import torch
import torch.nn as nn
import torch.nn.functional as F
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients, TokenReferenceBase
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer
#%%
class SampleModel(nn.Module):
def __init__(self):
super(SampleModel, self).__init__()
self.transformer_model = RobertaModel.from_pretrained('roberta-base',output_hidden_states=True)
self.linear = nn.Linear(768, 4)
def forward(self, input_ids, attention_masks):
out = self.transformer_model(input_ids, token_type_ids=None, attention_mask=attention_masks)
out = out[0].squeeze(0).mean(0)
out = self.linear(out)
return out
#%%
def encode_sentences(tokenizer, article_list, evidence_list, max_len=512):
input_ids = []
attention_masks = []
# ensure length of article_list and evidence_list are equal
zip_list = zip(cycle(article_list), evidence_list) if len(article_list) < len(evidence_list) else zip(article_list, evidence_list)
for article, evidence in zip_list:
encoded_dict = tokenizer.encode_plus(
text=evidence, # Sentence to encode.
text_pair=article,
add_special_tokens=True, # Add '[CLS]' and '[SEP]'
max_length=max_len, # Pad & truncate all sentences.
pad_to_max_length=True,
return_attention_mask=True, # Construct attn. masks.
return_tensors='pt', # Return pytorch tensors.
)
input_ids.append(encoded_dict['input_ids'])
attention_masks.append(encoded_dict['attention_mask'])
# Convert the lists into tensors.
input_ids = torch.cat(input_ids, dim=0)
attention_masks = torch.cat(attention_masks, dim=0)
return input_ids, attention_masks
def forward_with_softmax(input_ids, masks):
output = model(input_ids, masks)
return torch.max(F.softmax(output), dim=0)[1]
#%%
article = "Manchester United is one of the most successful English football club in the history. The club has been performing poorely since their stark manager Alex Ferguson retired. However recently appointed manager, Ole is showing signs of promise in the rebuilding process."
evidence = "Ole Gunner Solskjaer is doing an excellent job with Manchester United"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
input_ids, masks = encode_sentences(tokenizer, [article], [evidence])
input_ids = input_ids.to(device)
masks = masks.to(device)
#%%
model = SampleModel().to(device)
lig = LayerIntegratedGradients(forward_with_softmax, model.transformer_model.embeddings)
token_reference = TokenReferenceBase(reference_token_idx=tokenizer.pad_token_id)
reference_indices = token_reference.generate_reference(sequence_length = 512, device = device).unsqueeze(0)
# %%
out = model(input_ids, masks)
# %%
# model.transformer_model.get_input_embeddings()
# %%
attr, delta = lig.attribute(inputs=input_ids, baselines= reference_indices, additional_forward_args=(masks), return_convergence_delta=True, n_steps=10, target=1)
print(attr)
Hi @sajjadriaj , it is expected that the model output has the number of examples as the first dimension, so the model output should be a tensor with dimensions 1 x 4, rather than 4, and should return output values for each input example when the batch size is greater than 1. I think this line of the forward function out = out[0].squeeze(0).mean(0)
might need to be checked to ensure the proper output dimensionality.
Generally, to compute attributions with Captum, you have to attribute with respect to a single output value per input example, with the index provided using the target argument. More information regarding this can be found here.
I had the same issue with RobertaForSequenceClassification model. I was getting the exact same error. Just make your target labels long using .long() (i.e., target.long()) and pass into the model. This solved my issue. Let me know it works for you or not.
@sajjadriaj, it looks like it has been a while since wee opened this issue ? Do you still have questions related to it ?
Hello,
I am pretty new to captum. I am trying to run the IntegratedGradient method for transformer based sequence classification. However I get the following error:
Here is my code:
This issue seems very similar to this one: https://github.com/huggingface/transformers/issues/2952
I have tried casting the input_ids and masks to long tensor according the the suggestion. But it did not help.