microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.56k stars 366 forks source link

How to get the attention weights(encoder, decoder or cross) after finetuning CodeBERT for Code-To-Text task? #119

Closed Tamal-Mondal closed 2 years ago

Tamal-Mondal commented 2 years ago

Hello Team,

I am using the CodeXGLUE pipeline for the Code-To-Text task and I want to visualize the attention to understand the relation between input-output. I have fine-tuned the CodeBERT model on our own dataset for the code2nl task and can make predictions using it. For visualization, I am using popular BertViz library(https://github.com/jessevig/bertviz) which takes attention weights and source tokens as input. Typically for Bert, we can get the attention weights in the following way:

outputs = model(inputs) attention = outputs[-1]

But in our case, the encoder(CodeBERT in this case) is not returning attention weights I think as per the following output(I just printed encoder output):

image

And I think the reason is, the config that we used is from the codeBERT repo in hugging-face, and there I don't see the parameter "output_attentions=True" is present which otherwise we can mention when loading the model using "from_pretrained". I have tried to add this parameter along with the existing config while loading CodeBert as an encoder but getting an "Unknown Parameter" error.

Can you please tell me how we can get the attention weights for the CodeBERT model in this case after fine-tuning? also do correct me if I said anything wrong.

Thanks in advance.

Regards, Tamal mondal

guoday commented 2 years ago

For encoder, you can pass "output_attentions=True" into the CodeBERT forward function to get attention weight. https://github.com/microsoft/CodeXGLUE/blob/e5f1b15cc33e9a2e77821005f77da88175e0397a/Code-Text/code-to-text/code/model.py#L55 change to outputs = self.encoder(source_ids, attention_mask=source_mask,output_attentions=True)

For decoder and cross-attention, we use pytorch's official transformer library. There doesn't seem to be an API to get attention weights.

Tamal-Mondal commented 2 years ago

Hi @guoday ,

Thanks for the quick reply. I tried that approach but don't see a difference in CodeBERT output when I am testing with some sample input and fine-tuned code-to-text model. The encoder output has only two parts, the last hidden layer output(n 256 768) and pooled output(n * 768).

Follow-up question: Did you mean that I have to use "output_attentions=True" in the model.py before I fine-tune the CodeBERT for the code2nl task?

Thanks & Regards, Tamal Mondal

guoday commented 2 years ago
outputs = self.encoder(source_ids, attention_mask=source_mask,output_attentions=True)
print(len(outputs[2]))
print([x.shape for x in outputs[2]])

Can you check again or update your transformers? I try the above code and output is the attention probability of 12 layers

12
[torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256]), torch.Size([32, 12, 256, 256])]
Tamal-Mondal commented 2 years ago

Thanks a lot, @guoday, understood now. I tried the approach and it worked.

I am closing the issue.

Regards, Tamal Mondal