Self Attention - Githubissues

Information

The question or comment is about chapter:

[X] Introduction
[ ] Text Classification
[ ] Transformer Anatomy
[ ] Multilingual Named Entity Recognition
[ ] Text Generation
[ ] Summarization
[ ] Question Answering
[ ] Making Transformers Efficient in Production
[ ] Dealing with Few to No Labels
[ ] Training Transformers from Scratch
[ ] Future Directions

Question or comment

I would like to say first thanks for writing this amazing book. And then ask a question about the attention mechanism in Transformers (referring to page 61). I am trying to compare the meaning and mechanism of what is named as Self Attention in Transformers with what I previously knew as Self attention from this paper:https://aclanthology.org/N16-1174.pdf and local and general attention from the following : https://arxiv.org/pdf/1508.04025.pdf What it has been used in these papers was HAN model with Self,local or Global attention on top of RNN , GRU, LSTM or CNN layers. As Transformers are new architecture , I am wondering if the mathematics behind the attention is same as these 2 papers or not?

Please forgive me if the question seems very basic for you.

Regards Shabnam

nlp-with-transformers / notebooks

Self Attention #47

Information

Question or comment