migrate to less confusing way of doing rotary

https://github.com/lucidrains/x-transformers/issues/250

hey Aleksa! hope you have been well! yes indeed i get a number of emails because i switch between the two ways of doing rotary, but the way you describe is the better one, despite a few more lines of code. perhaps one day we will migrate towards a complex representation so the pairing should be done on the last dimension

could you let me know if this PR looks more intuitive?

as for the unrotated dimensions, if you are referring to t_unrotated, that's actually from GPT-J (adopted by Alphacode too i believe), where you leave some of the feature dimensions unrotated. it leads to slightly improved results

lucidrains / x-transformers

migrate to less confusing way of doing rotary #251