Closed orrzohar closed 1 month ago
Hi Orr
Thank you for your interest in our paper
Our code is based on code-llama's RoPE implementation https://github.com/meta-llama/codellama/blob/e81b597e44dbecc2a0dedb9949fdf84adfc22395/llama/model.py#L55-L85
As you said, it uses complex numbers and is different from sin-cos implementation But, I never tested sin-cos implementation. Thus, I know nothing about the issue related to speed I think it might be a good impl. because the code-llama is a big project
Best Heo
Hi Heo,
I'll look into this -- i am trying to scale your implementation and want to make sure that I am not going to significantly slow down my codebase :)
Best, Orr
Hi,
The application of the 2d RoPE as described, at least in your implementation:
https://github.com/naver-ai/rope-vit/blob/c316aecc0be984dc3d23fa79c056da290be7a3ec/self-attn/rope_self_attn.py#L61-L67
Is different from the standard:
wouldn't your implementation be a lot slower than the normal one? do you have a sense of how much an effect this has on training time?
Best, Orr