nlp-with-transformers / notebooks

Jupyter notebooks for the Natural Language Processing with Transformers book
https://transformersbook.com/
Apache License 2.0
3.92k stars 1.23k forks source link

Self Attention #47

Open ShabRa1365 opened 2 years ago

ShabRa1365 commented 2 years ago

Information

The question or comment is about chapter:

Question or comment

Hi

I would like to say first thanks for writing this amazing book. And then ask a question about the attention mechanism in Transformers (referring to page 61). I am trying to compare the meaning and mechanism of what is named as Self Attention in Transformers with what I previously knew as Self attention from this paper:https://aclanthology.org/N16-1174.pdf and local and general attention from the following : https://arxiv.org/pdf/1508.04025.pdf What it has been used in these papers was HAN model with Self,local or Global attention on top of RNN , GRU, LSTM or CNN layers. As Transformers are new architecture , I am wondering if the mathematics behind the attention is same as these 2 papers or not?

Please forgive me if the question seems very basic for you.

Regards Shabnam