lucidrains / rotary-embedding-torch

Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
MIT License
535 stars 43 forks source link

xPOS embeddings during inference #27

Closed VarunGumma closed 3 months ago

VarunGumma commented 3 months ago

Hi @lucidrains,

I see that while using RoPE during inference (cached keys), we have to use the rotate_queries_with_cached_keys(q, k) method, or manually add an offset. What would be the case for xPOS? I see that offset is required as per torchscale, but there is not scope for it in this implementation. Can you please take a look and fix it?

lucidrains commented 3 months ago

@VarunGumma hey Varun! yes you are right i didn't account for xpos in that function

do you want to see if 0.6.3 works?

VarunGumma commented 3 months ago

@lucidrains, thanks for the quick fix! I have not tested it, but as per my knowledge, that's how it should be. I have integrated your module into my fairseq repo which can help train RoPE-based transformer models.