I have a question about your rotary position implementation. I will use the notation and equation numbers from [1].
This line is the realization of Eq. 34 in [1]. If I am correct, why you add a fixed positional embedding to the query/key/value in this line?
Another way to put the same question, The position m in Eq. 13 appears inside the 2x2 rotation matrix only. So why add the position m encoding to the query/key/value?
BTW, thanks for all your open-source code, great job!
Hi Phillip,
I have a question about your rotary position implementation. I will use the notation and equation numbers from [1]. This line is the realization of Eq. 34 in [1]. If I am correct, why you add a fixed positional embedding to the query/key/value in this line?
Another way to put the same question, The position m in Eq. 13 appears inside the 2x2 rotation matrix only. So why add the position m encoding to the query/key/value?
BTW, thanks for all your open-source code, great job!
[1] RoFormer: Enhanced Transformer with Rotary Position Embedding