lucidrains / x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.42k stars 377 forks source link

migrate to less confusing way of doing rotary #251

Closed lucidrains closed 2 months ago

lucidrains commented 2 months ago

https://github.com/lucidrains/x-transformers/issues/250

hey Aleksa! hope you have been well! yes indeed i get a number of emails because i switch between the two ways of doing rotary, but the way you describe is the better one, despite a few more lines of code. perhaps one day we will migrate towards a complex representation so the pairing should be done on the last dimension

could you let me know if this PR looks more intuitive?

as for the unrotated dimensions, if you are referring to t_unrotated, that's actually from GPT-J (adopted by Alphacode too i believe), where you leave some of the feature dimensions unrotated. it leads to slightly improved results

gordicaleksa commented 2 months ago

Thanks Phil! Yeah can't complain! :) I see you're productive as ever!

The PR looks consistent with your other repo - lgtm!

yup I was referring to t_unrotated thanks for the reference!