lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.07k stars 143 forks source link

Rotary Position Embedding #81

Open ahmdtaha opened 2 years ago

ahmdtaha commented 2 years ago

Hi Phillip,

I have a question about your rotary position implementation. I will use the notation and equation numbers from [1]. This line is the realization of Eq. 34 in [1]. If I am correct, why you add a fixed positional embedding to the query/key/value in this line?

Another way to put the same question, The position m in Eq. 13 appears inside the 2x2 rotation matrix only. So why add the position m encoding to the query/key/value?

BTW, thanks for all your open-source code, great job!

[1] RoFormer: Enhanced Transformer with Rotary Position Embedding