ICLR 2021 | Random feature attention

richardbaihe / paperreading

NLP papers

MIT License

2 stars 0 forks source link

ICLR 2021 | Random feature attention #67

Closed richardbaihe closed 3 years ago

richardbaihe commented 3 years ago

This paper introduce to replace transformer's softmax self-attention with random feature attention, which is a kernal approximation method. The calculation is shown below:

where equation 4 is supported by this theorem:

Here is the results: