hustvl / Vim

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Apache License 2.0
2.55k stars 159 forks source link

Rotary Positional Embedding #66

Open AliYoussef97 opened 2 months ago

AliYoussef97 commented 2 months ago

Hello,

Thank you for the amazing work!

I had a brief question, I do not quite understand why the rotary positional encoding is applied to the hideen states and residuals before each Mamba Layer, instead of applying it once after patch embedding. Moreover, I am assuimg if if_rope=True, then if_abs_pos_embed=False?

Edit: I understand now why the rotary positional encoding is applied at each layer.

Thank you!

MarioPaps commented 2 months ago

Hi,

I was also wondering about the same thing. Could you please provide some more details about why the rotary positional encoding is applied at each layer?