richardbaihe / paperreading

NLP papers
MIT License
2 stars 0 forks source link

ICLR 2021 | Random feature attention #67

Closed richardbaihe closed 3 years ago

richardbaihe commented 3 years ago

This paper introduce to replace transformer's softmax self-attention with random feature attention, which is a kernal approximation method. The calculation is shown below:

image

where equation 4 is supported by this theorem:

image

Here is the results:

image image