hila-chefer / Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
MIT License
1.73k stars 231 forks source link

Reproducibility in another text dataset #17

Closed carlosabcs closed 3 years ago

carlosabcs commented 3 years ago

Hi @hila-chefer,

First of all, congratulations on your paper, the proposal you present is really great!

I am trying to apply your method as an explicability mechanism to the BERT model that I am using to classify the polarization (sentiment) expressed in a set of tweets. My goal is, through your method, to obtain the parts of the text that led to the prediction made. Basically, I have a BERTForSequenceClassification model, which predicts between three target classes, I also have saved the attention arrays returned by the model usingoutput_attentions = True.

I have reviewed the code from bert_pipeline.py, and also BERT-explainability.ipynb, but I still can't quite figure well out how to apply the code.

I was thinking of using the attentions I got from the trained model, but it seems to me that it would not be possible because in generate_LRP, you make use of: .attention.self.get_attn_gradients () and .attention.self.get_attn_cam (). Right?

In any case, what I would have to do would be to configure the pre-trained model, my tweets, outputs, and with that data and my model, pass all that to generate_LRP, then?

Could you please help me with this?

hila-chefer commented 3 years ago

Hi @carlosabcs, thanks for your kind words and for your interest in our work!

I actually have 2 suggestions:

  1. Assuming your code is really based only on BERTForSequenceClassification, you can just use our colab as is! i.e. simply load your own weights instead on the pretrained weights I loaded there, and use an example from your tweets instead of my free form text.
  2. If you do need to implement from scratch, you can use our new method which has no LRP so implementation is easier, all you need is gradients and pure attention maps which is super easy to obtain (one backwards pass and you can use a backward hook for gradients).

Given the information you provided, I think option 1 is effortless to you and should work.

Do let me know if you have anymore questions :)

carlosabcs commented 3 years ago

You're welcome :)

I'm going to try the first option and tell you later how it turned out. Thank you very much for your quick response!

hila-chefer commented 3 years ago

Closing this issue for now. If other problems arise please reopen.

carlosabcs commented 3 years ago

Hi @hila-chefer, it worked!!

Just one more question, how can I run this script locally? Should I only add it to the main folder of the cloned project on my computer?

Oh, and something else, I don't know if you've tried to run your notebook again, but originally it asks me to do model.to("cuda"), otherwise, an error appears.

hila-chefer commented 3 years ago

Great to hear that @carlosabcs! Feel free to share your results :) also, please consider citing our paper if you use our method :) To address your questions- the script from our notebook should work if you add it to the main folder of your cloned repository. About the error you encountered- I actually ran the notebook several times and didn’t get that error, but I’ll be sure to check it again, and fix this issue.

carlosabcs commented 3 years ago

Yeah, of course, I'll cite your paper!

I have another question, is there a way I could use this method with BERTModel?

I mean, I have another dataset for Fake News detection, where I use the BERT representations obtained from BERTModel, not BERTForSequenceClassification because I combine the representations with topological features of the diffusion network of the news, and then use all the combined vector as a feature vector for classification (where I use traditional classification algorithms: RF, LR, SVM, etc). This is our paper (I'm sorry it's not in English 😢). So I would like to use the attention weights obtained from your method to provide a visualization about the parts of each new that got more attention of the BERTModel.

hila-chefer commented 3 years ago

@carlosabcs the answer is probably yes. Our method can provide an attention like matrix that will reflect the impact of each token on each of the other tokens. The structure of your model isn’t really clear to me, but assuming you concatenate another model to BERT, you need to implement LRP for the entire network for the method used in this paper. If you wish to avoid it, you can use our new method with the link I mentioned, in that case you only need to propagate gradients which is way easier. In any case, a good first step would be to understand if the output of our explainability method suits your needs for this scenario specifically.