hey Aleksa! hope you have been well! yes indeed i get a number of emails because i switch between the two ways of doing rotary, but the way you describe is the better one, despite a few more lines of code. perhaps one day we will migrate towards a complex representation so the pairing should be done on the last dimension
could you let me know if this PR looks more intuitive?
as for the unrotated dimensions, if you are referring to t_unrotated, that's actually from GPT-J (adopted by Alphacode too i believe), where you leave some of the feature dimensions unrotated. it leads to slightly improved results
https://github.com/lucidrains/x-transformers/issues/250
hey Aleksa! hope you have been well! yes indeed i get a number of emails because i switch between the two ways of doing rotary, but the way you describe is the better one, despite a few more lines of code. perhaps one day we will migrate towards a complex representation so the pairing should be done on the last dimension
could you let me know if this PR looks more intuitive?
as for the unrotated dimensions, if you are referring to
t_unrotated
, that's actually from GPT-J (adopted by Alphacode too i believe), where you leave some of the feature dimensions unrotated. it leads to slightly improved results