thunlp / PLMpapers

Must-read Papers on pre-trained language models.
MIT License
3.33k stars 436 forks source link

New candidate for analysis papers: Transformer Dissection #1

Closed slvher closed 5 years ago

slvher commented 5 years ago

Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel https://arxiv.org/abs/1908.11775

Another perspective to understand the attention formulation of Transformer.