Closed xiaoda99 closed 3 years ago
Hi @xiaoda99, thank you for reminding us about it :) We'll find some time and add part 2. What type of Bert tasks are you working on ? If you find any papers or resources related to that, feel free to post it here :) Thank you!
I'm working on attributing model predictions to attention weights of ALL self-attention layers, i.e. to see which attention links are more important for model predictions. The preferred attribution method is integrated gradients. Two related papers that I want to follow (with relevant section numbers): [1] Hao, Y., Dong, L., Wei, F., & Xu, K. (2020). Self-Attention Attribution: Interpreting Information Interactions Inside Transformer. arXiv preprint arXiv:2004.11207. (Section 4) [2] Cui, L., Cheng, S., Wu, Y., & Zhang, Y. (2020). Does BERT Solve Commonsense Task via Commonsense Knowledge?. arXiv preprint arXiv:2008.03945. (Section 4.2-5.2)
Thank you @xiaoda99, I'll look into them.
Bert tutorial part 2 PR: https://github.com/pytorch/captum/pull/593
Closing, addressed in PR: #593
📚 Documentation
Interpreting BERT Models (Part 1) (https://captum.ai/tutorials/Bert_SQUAD_Interpret) . It is mentioned at the end of this tutorial that "In the Part 2 of this tutorial we will to go deeper into attention layers, heads and compare the attributions with the attention weight matrices, study and discuss related statistics". That's exactly the topic I want to read but I just can't find Part 2 anywhere on the web. If Part 2 hasn't been finished yet when will it be availlable?