jessevig / bertviz

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1
Apache License 2.0
6.94k stars 782 forks source link

How can use bertviz for Bert Questioning Answering?? #26

Closed bvy007 closed 4 years ago

bvy007 commented 4 years ago

Is there any way to see the attention visualization for Bert Questioning and Answering model ?? Because I couldn't see BertForQuestionAnswering class in bertviz.pytorch_transformers_attn? I have fine-tuned over a QA dataset using hugging-face transformers and wanted to see the visualization for it. Can you suggest any way of doing it ??

jessevig commented 4 years ago

Hi, I've recently pushed some significant changes that enable you to run any model from the transformers library. If you are using the head view, you should be able to adapt the following notebook for your use case: https://github.com/jessevig/bertviz/blob/master/head_view_bert.ipynb Please let me know if that works for you.

bvy007 commented 4 years ago

Thanks for the quick reply. I tried out but it didn't show anything. I did as following:

from bertviz import head_view
from transformers import BertTokenizer, BertForQuestionAnswering

%%javascript
require.config({
  paths: {
      d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min',
      jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min',
  }
});

def show_head_view(model, tokenizer, sentence_a, sentence_b=None):
    inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', add_special_tokens=True)
    token_type_ids = inputs['token_type_ids']
    input_ids = inputs['input_ids']
    attention = model(input_ids, token_type_ids=token_type_ids)[-1]
    input_id_list = input_ids[0].tolist() # Batch index 0
    tokens = tokenizer.convert_ids_to_tokens(input_id_list)
    if sentence_b:
        sentence_b_start = token_type_ids[0].tolist().index(1)
    else:
        sentence_b_start = None
    head_view(attention, tokens, sentence_b_start)

do_lower_case = True
model = BertForQuestionAnswering.from_pretrained('bert_output/', output_attentions=True)
tokenizer = BertTokenizer.from_pretrained('bert_output/', do_lower_case=do_lower_case)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
show_head_view(model, tokenizer, sentence_a, sentence_b)

Also I tried in this way to load the model using state_dictbut this time it throwed an error saying :

import torch
model_version = 'bert-base-uncased'
do_lower_case = True
model_state_dict = torch.load('bert_output/pytorch_model.bin')
model = BertForQuestionAnswering.from_pretrained(model_version, state_dict=model_state_dict)
tokenizer = BertTokenizer.from_pretrained('bert_output/', do_lower_case=do_lower_case)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
show_head_view(model, tokenizer, sentence_a, sentence_b)

ERROR:

<ipython-input-6-708daaadcd2d> in show_head_view(model, tokenizer, sentence_a, sentence_b)
     10     else:
     11         sentence_b_start = None
---> 12     head_view(attention, tokens, sentence_b_start)

/content/gdrive/My Drive/bertviz/bertviz/head_view.py in head_view(attention, tokens, sentence_b_start, prettify_tokens)
     58         slice_b = slice(sentence_b_start, len(tokens))  # Position corresponding to sentence B in input
     59         attn_data['aa'] = {
---> 60             'attn': attn[:, :, slice_a, slice_a].tolist(),
     61             'left_text': tokens[slice_a],
     62             'right_text': tokens[slice_a]

IndexError: too many indices for tensor of dimension 2

Here 'bert_output/' is a directory that has the model('pytorch_model.bin') in it which I finetuned over a specific dataset. Could you please correct me if I am wrong?

jessevig commented 4 years ago

Hmm, I suspect the model is not getting loaded correctly. I believe the directory might need a configuration file as well, though I can't recall. Are you able to display the value of attention that is being returned when you call your model?

bvy007 commented 4 years ago

I can see values when printing the attentionvariable in show_head_view()

image

jessevig commented 4 years ago

Okay, I think I see the problem in the second version. You need to set output_attentions=True. Want to try again with that change and let me know if it works?

bvy007 commented 4 years ago

I tried as following:

import torch
model_version = 'bert-base-uncased'
do_lower_case = True
model_state_dict = torch.load('bert_output/pytorch_model.bin')
model = BertForQuestionAnswering.from_pretrained(model_version, state_dict=model_state_dict, output_attentions=True)
tokenizer = BertTokenizer.from_pretrained('bert_output/', do_lower_case=do_lower_case)
sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
show_head_view(model, tokenizer, sentence_a, sentence_b)

But ended up showing nothing in the view as below in the figure

image

I got the same output in first version :(

jessevig commented 4 years ago

Are you able to display the value of the attention returned again?

bvy007 commented 4 years ago

Yes I could see all the attention values without any error this time.

image

bvy007 commented 4 years ago

Do i need to check with website settings(d3) for the visualization???

jessevig commented 4 years ago

The attention looks good.

It might relate to the website settings. Do you still have the following at the top?:

%%javascript require.config({ paths: { d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min', jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min', } });

I'm working on a change where this is not needed, but currently it is. Sometimes there's some weirdness where you have to refresh your browser. Which browser are you using by the way?

bvy007 commented 4 years ago

The javascript code looks good.

Yeah I was thinking the same. Maybe I have some problem with the browser. I am currently using google chrome.

jessevig commented 4 years ago

Hmm, not sure if problem is with browser or not. Are you able to run the original version of the notebook without using your fine-tuned model?

bvy007 commented 4 years ago

I guess there is a problem with the browser. The fine-tuned model is also giving me the similar results.

image

jessevig commented 4 years ago

Occassionally it doesn't work the first time. Could you try re-running from the beginning. If that doesn't work could you close and re-open your browser? If that doesn't work, could you check the Javascript console for any errors?

bvy007 commented 4 years ago

Sure Thank you so much ! :)

bvy007 commented 4 years ago

Hi found this error in the web browser:

output_binary.js?vrz=colab-20191126-082400-RC00_282570569:75 Error evaluating Javascript output:  ReferenceError: requirejs is not defined
    at eval (eval at <anonymous> (output_binary.js?vrz=colab-20191126-082400-RC00_282570569:NaN), <anonymous>:12:1)
    at eval (<anonymous>)
    at ta.eval [as c] (output_binary.js?vrz=colab-20191126-082400-RC00_282570569:75)
    at va (output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14)
    at xa.next (output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14)
    at eval (output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15)
    at new Promise (<anonymous>)
    at ya (output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15)
    at y (output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15)
    at td (output_binary.js?vrz=colab-20191126-082400-RC00_282570569:74)
eval @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:75
va @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14
next @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14
eval @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
ya @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
y @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
td @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:74
eval @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:74
va @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14
next @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14
eval @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
ya @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
y @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
pd @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:73
eval @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:70
va @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14
next @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14
eval @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
ya @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
y @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
T.g @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:70
X.g @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:113
U.F @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:96
qe.F @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:120
eval @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:95
va @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14
next @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14
b @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
Promise.then (async)
f @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
eval @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
ya @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
y @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
Sd @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:94
eval @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:94
va @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14
next @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:14
eval @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
ya @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
y @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:15
Pd @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:94
U.u.onmessage @ output_binary.js?vrz=colab-20191126-082400-RC00_282570569:91
jessevig commented 4 years ago

I see, thanks. So you're running on Colab? Sorry, I was assuming you were running it on your local machine with Jupyter notebook. Colab requires a different javascript setup, as shown here: https://colab.research.google.com/drive/1PEHWRHrvxQvYr9NFRC-E_fr3xDq1htCj#scrollTo=Mv6H9QK9yLLe

Instead of:

%%javascript require.config({ paths: { d3: '//cdnjs.cloudflare.com/ajax/libs/d3/3.4.8/d3.min', jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min', } })

You need:

def call_html(): import IPython display(IPython.core.display.HTML('''

    <script>
      requirejs.config({
        paths: {
          base: '/static/base',
          "d3": "https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.8/d3.min",
          jquery: '//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min',
        },
      });
    </script>
    '''))

And then you need to add this line right before you invoke head_view(): call_html()

Please let me know if that works. Sorry, working on making this more uniform between platforms. Javascript is always a challenge, especially in Colab!

bvy007 commented 4 years ago

Yeah. It works now Thank you so much !! :)

jessevig commented 4 years ago

Awesome, glad to hear. Working on changes to make this easier in the future. Thanks.

bvy007 commented 4 years ago

One more question, Sometimes I see in colab notebook that it throws "Runtime Disconnected" error when using long sequences. Is that common??

jessevig commented 4 years ago

I hadn't seen that before, but I'm not surprised that it might happen for very long sequences. How long are the sequences?

jessevig commented 4 years ago

Also do you know if that error happens after the attentions have been returned, or before? I'm assuming it probably happens between code where attention is computed and visualization is rendered, but curious.

bvy007 commented 4 years ago

length of sequence sentence_a = 40 length of sequence sentence_b = 225

Yes It happens in between attention calculation and rendering

bvy007 commented 4 years ago

sentence_a = "who plays lady talisa in game of thrones" sentence_b = "

Oona Castilla Chaplin ( ˈuna kasˈtija ˈt͡ʃaplin ) ( born 4 June 1986 ) , known professionally as Oona Chaplin , is a Spanish actress . Her roles include Talisa Maegyr in the HBO TV series Game of Thrones , The Crimson Field and the series Taboo .

"

jessevig commented 4 years ago

Thanks for sharing that. The tool doesn't currently scale well to long texts unfortunately.