lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.07k stars 143 forks source link

[Feature] Adding fixed positional embeddings as an option #47

Closed gulnazaki closed 3 years ago

gulnazaki commented 3 years ago

I believe that, although using learnable positional embeddings is the trend nowadays, it would help to use fixed embeddings (sinusoidal, as in the original implementation), in relatively small dataset scenarios, where it would be hard to learn a meaningful embedding. At least, it would be interesting to compare both methods.

I see you included fixed embeddings in the reformer implementation, but don't you think it would be more efficient to calculate them once during the initialization? (like here)

Btw, I read a cool paper that compares fixed positional ambeddings and the ones learned by BERT, GPT2 and roBERTa.

If you prefer, I could do a PR on this adding the implementation in the above pytorch tutorial but it is no big deal.

lucidrains commented 3 years ago

@gulnazaki yea sure, I would welcome a PR on that :D I'll check out the paper you recommended tonight, thank you! Another good one I read recently is https://arxiv.org/abs/2006.15595

gulnazaki commented 3 years ago

Seems pretty interesting, I'll check it out thanks.

Ok, I'll give it a look later. Do you think axial would also be a good embedding option I should include?

lucidrains commented 3 years ago

@gulnazaki yea, axial is great! :)