RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long

sajjadriaj commented 4 years ago

Hello,

I am pretty new to captum. I am trying to run the IntegratedGradient method for transformer based sequence classification. However I get the following error:

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.FloatTensor instead (while checking arguments for embedding)

Here is my code:

import os
import sys

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import torch
import torch.nn as nn

from transformers import BertTokenizer, BertForQuestionAnswering, BertConfig
from transformers import RobertaTokenizer, RobertaForSequenceClassification, RobertaConfig

from sklearn.preprocessing import LabelEncoder

from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer
import re

device = torch.device("cpu")
# load model
model = RobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=4,
                                                                      output_attentions=False,
                                                                      output_hidden_states=False,)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

def encode_sentences(tokenizer, article_list, evidence_list, max_len=512):
    """
    Encodes article/evidence pair and returns tensor
    :param tokenizer: tokenizer to use to encode
    :param article_list: list of str
    :param evidence_list: list of str
    :param max_len: int; max length of encoding
    :return: encoded tensors
    """

    input_ids = []
    attention_masks = []

    # ensure length of article_list and evidence_list are equal
    zip_list = zip(cycle(article_list), evidence_list) if len(article_list) < len(evidence_list) else zip(article_list, evidence_list)

    for article, evidence in zip_list:
        encoded_dict = tokenizer.encode_plus(
            text=evidence,  # Sentence to encode.
            text_pair=article,
            add_special_tokens=True,  # Add '[CLS]' and '[SEP]'
            max_length=max_len,  # Pad & truncate all sentences.
            pad_to_max_length=True,
            return_attention_mask=True,  # Construct attn. masks.
            return_tensors='pt',  # Return pytorch tensors.
        )

        input_ids.append(encoded_dict['input_ids'])
        attention_masks.append(encoded_dict['attention_mask'])

    # Convert the lists into tensors.
    input_ids = torch.cat(input_ids, dim=0)
    attention_masks = torch.cat(attention_masks, dim=0)

    return input_ids, attention_masks

def predict_instance(input_ids, attention_masks):
        with torch.no_grad():
            outputs = model(input_ids, token_type_ids=None, attention_mask=attention_masks)

        logits = outputs[0]
        logits = logits.detach().cpu().numpy()
        softmax_values = softmax([logits[0]]).flatten()
        type(softmax_values)
        return softmax_values

article = "Manchester United is one of the most successful English football club in the history. The club has been performing poorely since their stark manager Alex Ferguson retired. However recently appointed manager, Ole is showing signs of promise in the rebuilding process."

evidence = "Ole Gunner Solskjaer is doing an excellent job with Manchester United"
input_ids, masks = encode_sentences(tokenizer, [article], [evidence])

# applying integrated gradients on the SoftmaxModel and input data point
ig = IntegratedGradients(predict_instance)
attributions, approximation_error = ig.attribute((input_ids, masks), target=target_class_index, return_convergence_delta=True)

# The input and returned corresponding attribution have the
# same shape and dimensionality.

assert attributions.shape == input.shape

This issue seems very similar to this one: https://github.com/huggingface/transformers/issues/2952

I have tried casting the input_ids and masks to long tensor according the the suggestion. But it did not help.

vivekmig commented 4 years ago

Hi @sajjadriaj , it seems that the issue here is that you are trying to compute attributions with respect to token indices, but we cannot actually compute gradients with respect to these indices, only to the corresponding embeddings. More information regarding this can be found in this FAQ answer.

To compute Integrated Gradients for tokens, you would need to attribute with respect to word embeddings rather than input indices, which can be done either by overriding the embedding layer or using a LayerAttribution method. Examples of these can be found in our BERT or IMDB tutorials here.

sajjadriaj commented 4 years ago

Hello @vivekmig Thank you so much for pointing me towards my mistake. I have followed the IMDB example, However I am getting the following error now:

AssertionError: Cannot choose target column with output shape torch.Size([4]).

Here is my updated code:

#%%
from transformers import RobertaTokenizer, RobertaConfig, RobertaForSequenceClassification, RobertaModel
import torch
import torch.nn as nn
import torch.nn.functional as F

from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients, TokenReferenceBase
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

#%%

class SampleModel(nn.Module):
    def __init__(self):
        super(SampleModel, self).__init__()
        self.transformer_model = RobertaModel.from_pretrained('roberta-base',output_hidden_states=True)
        self.linear = nn.Linear(768, 4)

    def forward(self, input_ids, attention_masks):
        out = self.transformer_model(input_ids, token_type_ids=None, attention_mask=attention_masks) 
        out = out[0].squeeze(0).mean(0)
        out = self.linear(out)
        return out

#%%
def encode_sentences(tokenizer, article_list, evidence_list, max_len=512):
    input_ids = []
    attention_masks = []

    # ensure length of article_list and evidence_list are equal
    zip_list = zip(cycle(article_list), evidence_list) if len(article_list) < len(evidence_list) else zip(article_list, evidence_list)

    for article, evidence in zip_list:
        encoded_dict = tokenizer.encode_plus(
            text=evidence,  # Sentence to encode.
            text_pair=article,
            add_special_tokens=True,  # Add '[CLS]' and '[SEP]'
            max_length=max_len,  # Pad & truncate all sentences.
            pad_to_max_length=True,
            return_attention_mask=True,  # Construct attn. masks.
            return_tensors='pt',  # Return pytorch tensors.
        )

        input_ids.append(encoded_dict['input_ids'])
        attention_masks.append(encoded_dict['attention_mask'])

    # Convert the lists into tensors.
    input_ids = torch.cat(input_ids, dim=0)
    attention_masks = torch.cat(attention_masks, dim=0)

    return input_ids, attention_masks

def forward_with_softmax(input_ids, masks):
    output = model(input_ids, masks)
    return torch.max(F.softmax(output), dim=0)[1]

#%%
article = "Manchester United is one of the most successful English football club in the history. The club has been performing poorely since their stark manager Alex Ferguson retired. However recently appointed manager, Ole is showing signs of promise in the rebuilding process."

evidence = "Ole Gunner Solskjaer is doing an excellent job with Manchester United"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
input_ids, masks = encode_sentences(tokenizer, [article], [evidence])
input_ids = input_ids.to(device)
masks = masks.to(device)

#%%
model = SampleModel().to(device)
lig = LayerIntegratedGradients(forward_with_softmax, model.transformer_model.embeddings)
token_reference = TokenReferenceBase(reference_token_idx=tokenizer.pad_token_id)
reference_indices = token_reference.generate_reference(sequence_length = 512, device = device).unsqueeze(0)

# %%
out = model(input_ids, masks)

# %%
# model.transformer_model.get_input_embeddings()
# %%
attr, delta = lig.attribute(inputs=input_ids, baselines= reference_indices, additional_forward_args=(masks), return_convergence_delta=True, n_steps=10, target=1)
print(attr)

vivekmig commented 4 years ago

Hi @sajjadriaj , it is expected that the model output has the number of examples as the first dimension, so the model output should be a tensor with dimensions 1 x 4, rather than 4, and should return output values for each input example when the batch size is greater than 1. I think this line of the forward function out = out[0].squeeze(0).mean(0) might need to be checked to ensure the proper output dimensionality.

Generally, to compute attributions with Captum, you have to attribute with respect to a single output value per input example, with the index provided using the target argument. More information regarding this can be found here.

Mihir3009 commented 4 years ago

I had the same issue with RobertaForSequenceClassification model. I was getting the exact same error. Just make your target labels long using .long() (i.e., target.long()) and pass into the model. This solved my issue. Let me know it works for you or not.

NarineK commented 4 years ago

@sajjadriaj, it looks like it has been a while since wee opened this issue ? Do you still have questions related to it ?

pytorch / captum

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long #405