New candidate for analysis papers: Transformer Dissection

thunlp / PLMpapers

Must-read Papers on pre-trained language models.

MIT License

3.33k stars 436 forks source link

Closed slvher closed 5 years ago

slvher commented 5 years ago

Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel https://arxiv.org/abs/1908.11775

Another perspective to understand the attention formulation of Transformer.