pytorch / captum

Model interpretability and understanding for PyTorch
https://captum.ai
BSD 3-Clause "New" or "Revised" License
4.81k stars 488 forks source link

Integrated Gradient For Intent Classification & Ner #510

Open ducpvftech opened 3 years ago

ducpvftech commented 3 years ago

Hi everyone,

I'm trying to use Camptum with OneNet using Allennlp, this is the model structure

OneNet(
  (text_field_embedder): BasicTextFieldEmbedder(
    (token_embedder_token_characters): TokenCharactersEncoder(
      (_embedding): TimeDistributed(
        (_module): Embedding()
      )
      (_encoder): TimeDistributed(
        (_module): CnnEncoder(
          (_activation): ReLU()
          (conv_layer_0): Conv1d(3, 128, kernel_size=(3,), stride=(1,))
        )
      )
    )
    (token_embedder_tokens): Embedding()
  )
  (encoder): PytorchSeq2SeqWrapper(
    (_module): LSTM(178, 200, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
  )
  (dropout): Dropout(p=0.5, inplace=False)
  (tag_projection_layer): TimeDistributed(
    (_module): Linear(in_features=400, out_features=57, bias=True)
  )
  (intent_projection_layer): Linear(in_features=400, out_features=20, bias=True)
  (ce_loss): CrossEntropyLoss()
)

And this is a sample output

sample = "xe day con mau vàng k sh"

{'tag_logits': array([[  8.193845,  18.159725,   3.070817,  -3.669226, ...,  -7.021739,  -9.783165, -10.414617, -14.490005],
        [ 11.836643,   4.574325,  17.798481,  -0.146769, ...,  -7.323572,  -8.025657, -11.729625, -15.194502],
        [ 13.941337,  -3.660825,   8.000876,   2.282541, ...,  -9.944183, -12.441767, -12.626238, -19.164455],
        [  4.384309,  -4.350079,  -4.387915,   2.233547, ...,  -9.741117,  -9.724459, -12.436659, -15.250616],
        [  2.312785,  -6.687048,  -6.087758,  -2.759617, ...,  -3.623748,   1.016447,  -6.195989,  -5.572791],
        [ 16.199913,  -3.463409,  -1.805555,  -3.65419 , ...,  -6.689859,  -1.246313,  -6.765724,  -7.277429],
        [ 15.870321,  -0.451358,  -3.963183,  -3.106677, ...,  -7.761865,  -7.660899,  -7.337141, -12.257715]],
       dtype=float32),
 'mask': array([1, 1, 1, 1, 1, 1, 1]),
 'tags': ['B-inform#object_type',
  'I-inform#object_type',
  'O',
  'B-inform#color',
  'I-inform#color',
  'O',
  'O'],
 'intent_probs': array([4.672179e-04, 9.995320e-01, 4.606894e-07, 7.099485e-09, 7.334805e-08, 5.847836e-09, 5.091730e-08, 3.163775e-09,
        3.281502e-09, 2.609285e-09, 1.424896e-11, 1.173377e-08, 1.556584e-09, 5.412061e-08, 4.719907e-08, 1.678568e-08,
        9.755362e-09, 1.716321e-08, 3.199067e-09, 4.867611e-10], dtype=float32),
 'words': ['xe', 'day', 'con', 'mau', 'vàng', 'k', 'sh'],
 'intent': 'inform',
 'span_tags': [('inform#object_type', 'inform#object_type (0, 1): xe day'),
  ('inform#color', 'inform#color (3, 4): mau vàng')],
 'nlu': {'inform': [('inform#object_type',
    'inform#object_type (0, 1): xe day'),
   ('inform#color', 'inform#color (3, 4): mau vàng')]}}

I got an error when used LayerIntegratedGradients,

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 2. Got 1 and 500 in dimension 0 at /pytorch/aten/src/TH/generic/THTensor.cpp:612

Here is the notebook : https://gist.github.com/ducpvftech/1cdb03429a7b9dbf7036d5c4c511ec45

Can you guide me on how to make this work for both intent classification and ner ?

Thank you for considering my request.

bilalsal commented 3 years ago

Hey @ducpvftech ,

the error seems to take place is the forward() method of your allennlp model, which is called through learner.model.forward_on_instance() in cell 16 of your notebook. Specifically, the embeddings computed by the model seem to be of different size. Does it have to do with the token being of different size?

I recommend you test your forward_fn() with some baseline inputs to make sure it can handle the samples you are feeding. Once forward_fn() works without errors, you should face no issues applying Captum's attribue() methods with your model and input.

Hope this helps

ducpvftech commented 3 years ago

Hi @bilalsal ,

I tested my forward_fn(),

def forward_fn(sample):

    print("forward_fn: ", sample)
    int2token = [learner.model.vocab._index_to_token["tokens"][i] for i in sample[0].numpy()]
    token2str = " ".join(int2token)
    tokens = [Token(s) for s in token2str.split()]
    instance = learner.dataset_reader.text_to_instance(tokens)
    intent_probs_tensor = torch.tensor(learner.model.forward_on_instance(instance)['intent_probs'])
    probs_squeeze = intent_probs_tensor.unsqueeze(0)
    print("probs_squeeze: ", probs_squeeze.shape)
    return torch.softmax(probs_squeeze, dim=-1)

### test base-line
sample = [0, 0, 0, 0, 0, 0, 0]
sample = torch.tensor(sample)
sample = sample.unsqueeze(0)
forward_fn(sample)

The result is fine

forward_fn:  tensor([[0, 0, 0, 0, 0, 0, 0]])
probs_squeeze:  torch.Size([1, 20])

tensor([[0.0508, 0.0467, 0.0467, 0.0467, 0.0467, 0.0467, 0.0467, 0.0467, 0.0468,
         0.0467, 0.0509, 0.0467, 0.0467, 0.0467, 0.0467, 0.1035, 0.0467, 0.0467,
         0.0481, 0.0467]])

Now I get other error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-43-61b7fe348def> in <module>()
----> 1 interpret_sentence(learner, 'xe day con mau vàng k sh', label=1)

<ipython-input-42-c330ba691776> in interpret_sentence(learn, sentence, min_len, label)
     48     # print(f"target: {target}")
     49     attributions_ig, delta = lig.attribute(input_indices, reference_indices, 0, 
---> 50                                            n_steps=1, return_convergence_delta=True)
     51 
     52 

/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/captum/attr/_core/layer/layer_integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta, attribute_to_layer_input)
    358             method=method,
    359             internal_batch_size=internal_batch_size,
--> 360             return_convergence_delta=False,
    361         )
    362 

/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/captum/attr/_core/integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
    282             internal_batch_size=internal_batch_size,
    283             forward_fn=self.forward_func,
--> 284             target_ind=expanded_target,
    285         )
    286 

/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/captum/attr/_utils/batching.py in _batched_operator(operator, inputs, additional_forward_args, target_ind, internal_batch_size, **kwargs)
    162         )
    163         for input, additional, target in _batched_generator(
--> 164             inputs, additional_forward_args, target_ind, internal_batch_size
    165         )
    166     ]

/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/captum/attr/_utils/batching.py in <listcomp>(.0)
    161             **kwargs
    162         )
--> 163         for input, additional, target in _batched_generator(
    164             inputs, additional_forward_args, target_ind, internal_batch_size
    165         )

/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/captum/attr/_core/layer/layer_integrated_gradients.py in gradient_func(forward_fn, inputs, target_ind, additional_forward_args)
    341                 # torch.unbind(forward_out) is a list of scalar tensor tuples and
    342                 # contains batch_size * #steps elements
--> 343                 grads = torch.autograd.grad(torch.unbind(output), inputs)
    344             return grads
    345 

/home/nqtuan/miniconda3/envs/xai/lib/python3.6/site-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
    155     return Variable._execution_engine.run_backward(
    156         outputs, grad_outputs, retain_graph, create_graph,
--> 157         inputs, allow_unused)
    158 
    159 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
bilalsal commented 3 years ago

Hi @ducpvftech ,

good to see the size mismatch is resolved.

As for the new error you are saying, maybe set requires_grad = True to your tensors, before calling .attribute()?

PhanDuc commented 3 years ago

Hi @bilalsal , I solved that problem by creaed a new forward_fn function

def forward_fn(sample):

    print("forward_fn: ", sample)
    int2token = [learner.model.vocab._index_to_token["tokens"][i] for i in sample[0].numpy()]
    token2str = " ".join(int2token)

    tokens = [Token(s) for s in token2str.split()]
    instance = learner.dataset_reader.text_to_instance(tokens)

    batch = next(iter(iterator([instance])))
    #intent_probs_tensor = torch.tensor(learner.model.forward_on_instance(instance)['intent_probs'])
    #probs_squeeze = intent_probs_tensor.unsqueeze(0)
    outputs = learner.model(**batch)
    # print(f"outputs: {outputs}")
    intent_probs_tensor = outputs["intent_probs"]
    print("probs_squeeze: ", intent_probs_tensor.shape)

    return torch.softmax(intent_probs_tensor, dim=-1)

But now, I dont know how to combine 2 layers of embedding.

(text_field_embedder): BasicTextFieldEmbedder(
    (token_embedder_token_characters): TokenCharactersEncoder(
      (_embedding): TimeDistributed(
        (_module): Embedding()
      )
      (_encoder): TimeDistributed(
        (_module): CnnEncoder(
          (_activation): ReLU()
          (conv_layer_0): Conv1d(3, 128, kernel_size=(3,), stride=(1,))
        )
      )
    )

    (token_embedder_tokens): Embedding()

for example, token_embedder_tokens layer after LayerIntegratedGradients will get size [500, 7, 50] but token_embedder_token_characters still [1, 7, 128] and they cannot concatenate.

NarineK commented 3 years ago

@PhanDuc, I think that you probably can still get this working without changing forward function too much. What is the dimensionality of the output tensor ? Is the output dimensionality something like: #examples x # tokens x #classes ? You might need to specify target as a tuple. For the NER tag you will perform attribution for specific token at a time.

ducpvftech commented 3 years ago

Hi @NarineK ,

The output tensor had the shape: #examples x # tokens x #embedding_shape

This is the notebook file what I'm trying so far, https://drive.google.com/file/d/1EeX074cJMDRMONjpTy42dUZxW0_VBer5/view?usp=sharing

This script works if n_steps=1 but if I change n_steps > 1, there will be an error. :(

NarineK commented 3 years ago

@ducpvftech, I've tried to run your example and I don't think I know exactly what version of spacy and other libraries you used. Could you, please, post the library requirements list? If it works with n_steps and not with n_steps > 1, I guess that probably the first dimension doesn't correspond to number of examples. What is the dimensionality of torch.softmax(intent_probs_tensor, dim=-1) ?

ducpvftech commented 3 years ago

Hi @NarineK , sorry for the errors in the notebook,

The dimensionality of torch.softmax(intent_probs_tensor, dim=-1) is torch.Size([1, 20])

This is the notebook that I carefully check to make sure it is able to reproduce an error with LayerIntegratedGradients https://drive.google.com/file/d/1EeX074cJMDRMONjpTy42dUZxW0_VBer5/view?usp=sharing

And when switched to IntegratedGradients, I solved the problem with n_steps https://colab.research.google.com/drive/1oFN5WwnrnvuGWIZajS6vLhG5sGi9MYlh?usp=sharing

But I still don't understand why LayerIntegratedGradients got an error and how to fix that. Also, how to including NER to visualize_text

Thank you @NarineK !

NarineK commented 3 years ago

@ducpvftech, the reason that it doesn't work with n_steps > 1 with LayerIG is that LayerIG internally expands the input. And the problem is here:

def forward_fn(sample):

    print("forward_fn, sample: ", sample)
    int2token = [learner.model.vocab._index_to_token["tokens"][i] for i in sample[0].cpu().data.numpy()]
    token2str = " ".join(int2token)

    tokens = [Token(s) for s in token2str.split()]
    instance = learner.dataset_reader.text_to_instance(tokens)

    batch = next(iter(iterator([instance])))
    #intent_probs_tensor = torch.tensor(learner.model.forward_on_instance(instance)['intent_probs'])
    #probs_squeeze = intent_probs_tensor.unsqueeze(0)
    outputs = learner.model(**batch)
    # print(f"outputs: {outputs}")
    intent_probs_tensor = outputs["intent_probs"]
    print("probs_squeeze: ", intent_probs_tensor.shape)
    print(f"intent_probs_tensor - shape: {torch.softmax(intent_probs_tensor, dim=-1).shape}")
    return torch.softmax(intent_probs_tensor, dim=-1)

In case of integrated gradients and layer integrated gradients the input gets expanded by the size of the n_steps. If sample has shape [1, 7] then it works but if sample has shape [n_steps, 7] then it doesn't because below code assumes that samples.shape = [1, 7]:

    int2token = [learner.model.vocab._index_to_token["tokens"][i] for i in sample[0].cpu().data.numpy()]
    token2str = " ".join(int2token)

    tokens = [Token(s) for s in token2str.split()]
    instance = learner.dataset_reader.text_to_instance(tokens)

If you make the changes / expansions accordingly, it should work.

ducpvftech commented 3 years ago

Hi @NarineK , thank you for your adviced!

I made the change by expanding the shape to [n_steps, 7]

    def forward_fn(sample):

        if sample.shape[0] == 1:
            ### convert from [1, tokens] to [n_step, tokens]
            # create a copy of data with shape [n_step, 1, tokens]
            y = sample.unsqueeze(0).repeat(n_steps, 1, 1) # for example: y.shape = torch.Size([10, 1, 7])
            # drop the dimension of size 1 -> [10, 7]
            new_sample = torch.squeeze(y)
        else:
            new_sample = sample

        all_probs = []
        try:
            int2tokens = [[onenet_learner.model.vocab._index_to_token["tokens"][w.item()] for w in token] for token in
                      new_sample]
        except:
            print("Error")

        for int2_token in int2tokens:
            token2str = " ".join(int2_token)

            tokens = [Token(s) for s in token2str.split()]
            # expanded the token with n_steps

            instance = onenet_learner.dataset_reader.text_to_instance(tokens)

            batch = next(iter(iterator([instance])))
            outputs = onenet_learner.model(**batch)
            intent_probs_tensor = torch.softmax(outputs["intent_probs"], dim=-1)

            all_probs.append(intent_probs_tensor)

        # convert to tensor
        all_probs_nsteps = torch.cat(all_probs, dim=0)

        return all_probs_nsteps

But it doesn't work, this is the reason:

file: /home/miniconda3/envs/xai/lib/python3.6/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py - line 125
return torch.cat(embedded_representations, dim=-1)

embedded_representations[0].shape has shape:torch.Size([1, 7, 128])`

and

embedded_representations[1].shape has shape: torch.Size([10, 7, 50])

The different output between 2 embedding layers :

TokenCharactersEncoder(
  (_embedding): TimeDistributed(
    (_module): Embedding()
  )
  (_encoder): TimeDistributed(
    (_module): CnnEncoder(
      (_activation): ReLU()
      (conv_layer_0): Conv1d(3, 128, kernel_size=(3,), stride=(1,))
    )
  )
)

and Embedding()

bilalsal commented 3 years ago

Hi @ducpvftech ,

is there a way to warrant that all embedded_representations have the same shape? Would padding the inputs to equal lengths make sense for your use case?

ducpvftech commented 3 years ago

Hi @bilalsal ,

There are 2 ways to do it.

  1. Overwrite to source code of file site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py

or

  1. Some how concatenate output from both embedding before error. I believe this is the best option, but I'm trying to make it work.
NarineK commented 3 years ago

Hi @ducpvftech, I have been trying the code and it feels unusual. If I set n_steps=1 in your code snippet, it fails.

To be honest, I don't understand why do we need to call batch = next(iter(iterator([instance]))) in the forward function. Shouldn't this happen outside ? I think what I don't understand is that in order to perform forward pass we need both sample and the batch in the forward_fn .

I feel that we are trying to make it more complex than it is. Could you, please, provide me a simple colab notebook that performs simple predict for which you'd like to compute feature importance scores.

ducpvftech commented 3 years ago

Hi @NarineK , this is a simple colab for simple predict: https://drive.google.com/file/d/1yc_vngmegzPmmgHAb3VV21r61xLZTjne/view?usp=sharing (please don't forget You must restart the runtime in order to use newly installed versions.)

I re-run the code in the previous notebook Layer Intergrade Gradient with n_steps=1 without any problem: https://drive.google.com/file/d/1EeX074cJMDRMONjpTy42dUZxW0_VBer5/view?usp=sharing

Thank you for helping me with this issue @NarineK !

NarineK commented 3 years ago

Thank you, @ducpvftech! I think that the challenge is how to convert token indices into instance type. If we cannot do that natural way then I think it is better to use IntegratedGradients instead of LayerIntegratedGradients because the inputs to the attribute must be tensors, other datatypes are not supported.

input_indices = torch.tensor([learn.model.vocab._token_to_index["tokens"][word] for word in tokens])

I see that here you created input_indices and didn't use it. Does it mean that we cannot convert input_indices into an instance type naturally ? https://colab.research.google.com/drive/1oFN5WwnrnvuGWIZajS6vLhG5sGi9MYlh?usp=sharing

Basically, I want to pass precomputed input_indices instead of internally re-computing them.

Converting input_indices back to text most probably won't help because the gradients will not get propagated accordingly.

ducpvftech commented 3 years ago

Hi @NarineK, thank you for your feedback and sorry for the late response,

In the link Colab you sent, I don't think using input_indices will make any difference since I passed an embedding to forward function, this will work without any problem with any input sentence.

instance = learn.dataset_reader.text_to_instance(tokens_allennlp)
batch = next(iter(iterator([instance])))
print("input_indices: ", input_indices)
input_embedding = interpretable_embedding.indices_to_embeddings(batch["tokens"])

And in the past, I have tried using input_indicesonly` but only got an error.

So, I think using IntegratedGradients suits me well.

In my case, this model return multi-output (intent & ner). How I can make IntegratedGradients works for both of them? I mean, there will have an explanation for Intent and an explanation for Ner ???

NarineK commented 3 years ago

Hi @ducpvftech, yes, I understand that in your example you get it working without input_indices. I brought it up as an example to explain LayerIG. LayerIG needs to take torch tensors as input for the inputs that we want to interpret. W.r.t. the NER, is ner prediction score stored in tag_logits ? I think there are two option.

  1. You need to attribute NER score to the inputs of the model similar to what you do for intent now in a separate ig.attribute call and a slightly modified forward function that returns NER score. tag_logits has dimension : 1 x num_tokens x num_embeddings, right ? I think we can sum across num_embeddings dimension and attribute to each tag. Basically, your modified forward function will return summed score for each token one at a time in a loop or you could also do that in a batch. Loop would be easier for the first try.
  2. If you want to combine intent and NER and call attribute once that is representing the summation of intent and NER, you can for example sum intent and NER scores together in the forward function, return that score and attribute w.r.t. that summed score.

Does this make sense ? Thank you, Narine

PhanDuc commented 3 years ago

Thank you @NarineK ,

tag_logits has dimension: num_token x 57, 57 is the (embeddings_size)

It is a good idea to compute both intent and ner at once. Both of your options are good for me to start! :)

I will get back to you soon ;)