Closed slvher closed 5 years ago
Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel https://arxiv.org/abs/1908.11775
Another perspective to understand the attention formulation of Transformer.
Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel https://arxiv.org/abs/1908.11775
Another perspective to understand the attention formulation of Transformer.