naver-ai / rope-vit

[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
https://arxiv.org/abs/2403.13298
Other
184 stars 2 forks source link

inference speed #8

Closed orrzohar closed 1 month ago

orrzohar commented 1 month ago

Hi,

The application of the 2d RoPE as described, at least in your implementation:

https://github.com/naver-ai/rope-vit/blob/c316aecc0be984dc3d23fa79c056da290be7a3ec/self-attn/rope_self_attn.py#L61-L67

Is different from the standard:

def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):
    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
    sin = sin[position_ids].unsqueeze(unsqueeze_dim)
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

wouldn't your implementation be a lot slower than the normal one? do you have a sense of how much an effect this has on training time?

Best, Orr

bhheo commented 1 month ago

Hi Orr

Thank you for your interest in our paper

Our code is based on code-llama's RoPE implementation https://github.com/meta-llama/codellama/blob/e81b597e44dbecc2a0dedb9949fdf84adfc22395/llama/model.py#L55-L85

As you said, it uses complex numbers and is different from sin-cos implementation But, I never tested sin-cos implementation. Thus, I know nothing about the issue related to speed I think it might be a good impl. because the code-llama is a big project

Best Heo

orrzohar commented 1 month ago

Hi Heo,

I'll look into this -- i am trying to scale your implementation and want to make sure that I am not going to significantly slow down my codebase :)

Best, Orr