Interpreting zero shot classification models

subhamkhemka commented 3 years ago

Hi

I want to use captum ai to interpret bart large mnli pre-trained model for zero shot classification task.

I am only using this for inference and no training is involved.

Can you please help with this query ?

Thanks & Regards, Subham

bilalsal commented 3 years ago

Hi @subhamkhemka ,

Yes you should be able to use Captum for attributing your model's output to its inputs (usually encoded by a tokenizer). You can further attribute an output to a layer (see layer attribution algorithms), if you want to analyze or debug the internals of your BART model.

Do you have a specific scenario in mind? We can gladly offer advice on how Captum could help there.

subhamkhemka commented 3 years ago

Hi @bilalsal

I am using the huggingface transformers library for the zero shot classification task

from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli", device=0)

sequence = "This personalised ceramic piggy bank with stunning hand drawn Alice in Wonderland design is perfect gift"
candidate_labels = [ 'wood', 'metal', 'eco friendly',  'sustainable', 'faux leather', 'soft vegan leather', 'led', 'solar'] # only few tags in this example

classifier(sequence, candidate_labels)

Output -

{'sequence': 'This personalised ceramic piggy bank with stunning hand drawn Alice in Wonderland design is perfect gift',
 'labels': ['handmade']
 'scores': [0.7525588274002075]}

I would like to find the words which highly contributed to the prediction "handmade". I have gone through the list of algorithms and think Integrated Gradients would be helpful.

How do I use this with the above code ?

Please help

Regards, Subham

bilalsal commented 3 years ago

Hi Subham, apologies for my belated response.

Since Captum, expects a tensor as an output instead of a dict, we first need wrap the model with wrapper that would return the underlying tensor. AFAIK, the transformers model shouldreturn a 1D tensor that stores the scores for ALL labels in candidate_labels. Is this true in your setup? If yes, your wrapper can simply return that tensor:

class WrapperModel(nn.Module):

    def __init__(self, classifier):
        self.classifier = classifier

    def forward(sequence, candidate_labels):
       output = self.model(sequence, candidate_labels)
       return output.scores

ig = IntegratedGradients(classifier)
sequence = "This personalised ceramic piggy bank with stunning hand drawn Alice in Wonderland design is perfect gift"
candidate_labels = [ 'wood', 'metal', 'eco friendly',  'sustainable', 'faux leather', 'soft vegan leather', 'led', 'solar'] # only few tags in this example
label_ind = 1 # Compute attribution for 'metal'
attributions_ig = ig.attribute(sequence, target=label_ind, additional_forward_args=candidate_labels)

Hope this helps Bilal

subhamkhemka commented 3 years ago

Hi Bilal,

Thanks for getting back to me and sharing the code snippet.

Running the pipeline code on its own produces a dictionary

from transformers import pipeline
sequence = "This personalised ceramic piggy bank with stunning hand drawn Alice in Wonderland design is perfect gift"
candidate_labels = [ 'wood', 'metal', 'eco friendly',  'sustainable', 'faux leather', 'soft vegan leather', 'led', 'solar'] # only few tags in this example
classifier(sequence,candidate_labels)

Output is a dictionary with scores as key and its type is a list.

{'sequence': 'This personalised ceramic piggy bank with stunning hand drawn Alice in Wonderland design is perfect gift',
 'labels': ['eco friendly',
  'sustainable',
  'led',
  'wood',
  'solar',
  'soft vegan leather',
  'metal',
  'faux leather'],
 'scores': [0.9233949780464172,
  0.07328024506568909,
  0.0007113866158761084,
  0.0005779403727501631,
  0.0005733320722356439,
  0.0005175700644031167,
  0.0004898202023468912,
  0.000454788445495069]}

When i run the code you have suggested :

import torch
import torch.nn as nn
import torch.nn.functional as F
from captum.attr import IntegratedGradients
from transformers import pipeline

class WrapperModel(nn.Module):

    def __init__(self, classifier):
        self.classifier = classifier

    def forward(sequence, candidate_labels):
        output = self.model(sequence, candidate_labels)
        return output.scores
        #### return torch.tensor(output['scores'])

classifier = pipeline("zero-shot-classification", model="typeform/distilbert-base-uncased-mnli")
ig = IntegratedGradients(classifier)
sequence = "This personalised ceramic piggy bank with stunning hand drawn Alice in Wonderland design is perfect gift"
candidate_labels = [ 'wood', 'metal', 'eco friendly',  'sustainable', 'faux leather', 'soft vegan leather', 'led', 'solar'] # only few tags in this example
label_ind = 1 # Compute attribution for 'metal'
attributions_ig = ig.attribute(sequence, target=label_ind, additional_forward_args=candidate_labels)

I get below error, I also tried changing the output.scores to torch.tensor(output['scores'])

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-8-018d7793f6d1> in <module>
----> 1 attributions_ig = ig.attribute(sequence, target=label_ind, additional_forward_args=candidate_labels)

/opt/anaconda3/lib/python3.8/site-packages/captum/log/__init__.py in wrapper(*args, **kwargs)
     33             @wraps(func)
     34             def wrapper(*args, **kwargs):
---> 35                 return func(*args, **kwargs)
     36 
     37             return wrapper

/opt/anaconda3/lib/python3.8/site-packages/captum/attr/_core/integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
    266         is_inputs_tuple = _is_tuple(inputs)
    267 
--> 268         inputs, baselines = _format_input_baseline(inputs, baselines)
    269 
    270         _validate_input(inputs, baselines, n_steps, method)

/opt/anaconda3/lib/python3.8/site-packages/captum/attr/_utils/common.py in _format_input_baseline(inputs, baselines)
     83     inputs: Union[Tensor, Tuple[Tensor, ...]], baselines: BaselineType
     84 ) -> Tuple[Tuple[Tensor, ...], Tuple[Union[Tensor, int, float], ...]]:
---> 85     inputs = _format_input(inputs)
     86     baselines = _format_baseline(baselines, inputs)
     87     return inputs, baselines

/opt/anaconda3/lib/python3.8/site-packages/captum/_utils/common.py in _format_input(inputs)
    157 
    158 def _format_input(inputs: Union[Tensor, Tuple[Tensor, ...]]) -> Tuple[Tensor, ...]:
--> 159     return _format_tensor_into_tuples(inputs)
    160 
    161 

/opt/anaconda3/lib/python3.8/site-packages/captum/_utils/common.py in _format_tensor_into_tuples(inputs)
    149         return None
    150     if not isinstance(inputs, tuple):
--> 151         assert isinstance(
    152             inputs, torch.Tensor
    153         ), "`inputs` must have type " "torch.Tensor but {} found: ".format(type(inputs))

AssertionError: `inputs` must have type torch.Tensor but <class 'str'> found:

It may be a simple error, I am a beginner so not sure how to handle the error.

Kindly assist

Thanks, Subham

bilalsal commented 3 years ago

Hi Subham,

apologies, I missed to check that the input expected by the zero-shot classification pipeline is a string not a tensor. Internally, PyTorch models operate on tensors: The pipeline you are using strings are implicitly converted to tensors via a tokenizer. Since Captum can only operate on tensor inputs and outputs, we need to break-down the operations packaged inside this transformers pipeline into individual steps, so that we can access the tensor inputs.

This link explains how this can be done: https://huggingface.co/facebook/bart-large-mnli

Based on that, for every label in your candidate_labels, you can run the model manually and apply Captum as follows:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
nli_model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')

premise = sequence
hypothesis = f'This example is {label}.'

# run through model pre-trained on MNLI
x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
                     truncation_strategy='only_first')

ig = IntegratedGradients(nli_model)
entailment_ind = 2
attributions_ig = ig.attribute(x, target=entailment_ind, additional_forward_args=candidate_labels)

In target=entailment_ind does not work, let me know what you get if you run logits = nli_model(x.to(device))[0] I can try to help you define the right target.

Bilal

subhamkhemka commented 3 years ago

Hi Bilal,

Thanks for getting back so quickly, I ran the below code as per your suggestions:

from captum.attr import IntegratedGradients
from transformers import AutoModelForSequenceClassification, AutoTokenizer
nli_model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')
sequence = "This personalised ceramic piggy bank with stunning hand drawn Alice in Wonderland design is perfect gift"
candidate_labels = [ 'wood', 'metal', 'eco friendly',  'sustainable', 'faux leather', 'soft vegan leather', 'led', 'solar'] # only few tags in this example

premise = sequence
hypothesis = f'This example is {candidate_labels}.'

# run through model pre-trained on MNLI
x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
                     truncation_strategy='only_first')

ig = IntegratedGradients(nli_model)
entailment_ind = 2
attributions_ig = ig.attribute(x, target=entailment_ind, additional_forward_args=candidate_labels)

I get this error -

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-c5a3e3734782> in <module>
      8 ig = IntegratedGradients(nli_model)
      9 entailment_ind = 2
---> 10 attributions_ig = ig.attribute(x, target=entailment_ind, additional_forward_args=candidate_labels)

/opt/anaconda3/lib/python3.8/site-packages/captum/log/__init__.py in wrapper(*args, **kwargs)
     33             @wraps(func)
     34             def wrapper(*args, **kwargs):
---> 35                 return func(*args, **kwargs)
     36 
     37             return wrapper

/opt/anaconda3/lib/python3.8/site-packages/captum/attr/_core/integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
    284             )
    285         else:
--> 286             attributions = self._attribute(
    287                 inputs=inputs,
    288                 baselines=baselines,

/opt/anaconda3/lib/python3.8/site-packages/captum/attr/_core/integrated_gradients.py in _attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, step_sizes_and_alphas)
    349 
    350         # grads: dim -> (bsz * #steps x inputs[0].shape[1:], ...)
--> 351         grads = self.gradient_func(
    352             forward_fn=self.forward_func,
    353             inputs=scaled_features_tpl,

/opt/anaconda3/lib/python3.8/site-packages/captum/_utils/gradient.py in compute_gradients(forward_fn, inputs, target_ind, additional_forward_args)
    110     with torch.autograd.set_grad_enabled(True):
    111         # runs forward pass
--> 112         outputs = _run_forward(forward_fn, inputs, target_ind, additional_forward_args)
    113         assert outputs[0].numel() == 1, (
    114             "Target not provided when necessary, cannot"

/opt/anaconda3/lib/python3.8/site-packages/captum/_utils/common.py in _run_forward(forward_func, inputs, target, additional_forward_args)
    448     additional_forward_args = _format_additional_forward_args(additional_forward_args)
    449 
--> 450     output = forward_func(
    451         *(*inputs, *additional_forward_args)
    452         if additional_forward_args is not None

/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/opt/anaconda3/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py in forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1419             )
   1420 
-> 1421         outputs = self.model(
   1422             input_ids,
   1423             attention_mask=attention_mask,

/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/opt/anaconda3/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py in forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
   1155 
   1156         if encoder_outputs is None:
-> 1157             encoder_outputs = self.encoder(
   1158                 input_ids=input_ids,
   1159                 attention_mask=attention_mask,

/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/opt/anaconda3/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    750 
    751         if inputs_embeds is None:
--> 752             inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale
    753 
    754         embed_pos = self.embed_positions(input_shape)

/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/sparse.py in forward(self, input)
    154 
    155     def forward(self, input: Tensor) -> Tensor:
--> 156         return F.embedding(
    157             input, self.weight, self.padding_idx, self.max_norm,
    158             self.norm_type, self.scale_grad_by_freq, self.sparse)

/opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1914         # remove once script supports set_grad_enabled
   1915         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1916     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1917 
   1918 

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)

Output for logits = nli_model(x)[0] is tensor([[ 5.6667, -2.3392, -3.8242]], grad_fn=<AddmmBackward>)

I had one more doubt, Am is setting the hypothesis paramter candidate_label correctly ? Is it fine to include all the labels for my use case or do I add one of the label and repeat this in a loop to get the attributions for all of them ?

Thanks, Subham

bilalsal commented 3 years ago

Hi Subham,

When using the model manually, you need to run it for every label in your candidate_labels (so yes, you would need a loop).

Also, could you try try running Captum attribution as follows? attributions_ig = ig.attribute(x, target=(0, 2))

subhamkhemka commented 3 years ago

Hi Bilal,

I have run the following code:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
nli_model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')

sequence = "This personalised ceramic piggy bank with stunning hand drawn Alice in Wonderland design is perfect gift"
candidate_labels=['wood']

from captum.attr import IntegratedGradients

premise = sequence
hypothesis = f'This example is {candidate_labels}.'

# run through model pre-trained on MNLI
x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
                     truncation_strategy='only_first')

ig = IntegratedGradients(nli_model)
attributions_ig = ig.attribute(x, target=(0, 2))

Note - candiate_labels is list with only one label.

I am getting the same error as my last comment.

Output for logits = nli_model(x)[0] is tensor([[ 4.8888, -2.4616, -3.0163]], grad_fn=<AddmmBackward>)

Thanks, Subham

bilalsal commented 3 years ago

Hi Subham,

can you try with candidate_labels='wood'?

subhamkhemka commented 3 years ago

Hi Bilal,

Get the same error with candidate_labels='wood'?

bilalsal commented 3 years ago

Hi Subham,

I missed to realize the issue is probably due to IntegratedGradient's reliance on FloatTensor as input type, while the nli_model expects a tensor of integers (token indices). This is because IntegratedGradient works by sampling several points along a line connecting your input x to the baseline provided (a 0 tensor if none is provided).

This explains why logits = nli_model(x) works while ig.attribute(x, ..) does not. 'x' should be an IntTensor because it is the output of a tokenizer.

To circumvent the above issue, please refer to Captum's BERT tutorial. You need to pay attention to the following:

We use LayerIntegratedGradients instead of IntegratedGradients. You need to select a suited layer in nli_model which acts as an embedding layer.
We need to construct a baseline suited for text input. Check ref_input_ids, ref_input_embeddings, and ref_token_type_ids in the three attribution examples.

You can find a simpler tutorial here to understand the above two critical points. Unlike the BERT tutorial, the latter tutorial uses a simple CNN model with an nn.Embedding layer, and a simple spacy tokenizer.

Please also refer to the Token Reference Base utility which is used in both tutorials.

Another solution to circumvent the issue with IntegratedGradient is to use interpretable embeddings. This would be a bit tricker to start with, however.

Hope this helps Bilal

pytorch / captum

Interpreting zero shot classification models #734