Closed VarunGumma closed 3 months ago
@VarunGumma hey Varun! yes you are right i didn't account for xpos in that function
do you want to see if 0.6.3 works?
@lucidrains, thanks for the quick fix! I have not tested it, but as per my knowledge, that's how it should be. I have integrated your module into my fairseq repo which can help train RoPE-based transformer models.
Hi @lucidrains,
I see that while using RoPE during inference (cached keys), we have to use the
rotate_queries_with_cached_keys(q, k)
method, or manually add anoffset
. What would be the case for xPOS? I see that offset is required as per torchscale, but there is not scope for it in this implementation. Can you please take a look and fix it?