I had a brief question, I do not quite understand why the rotary positional encoding is applied to the hideen states and residuals before each Mamba Layer, instead of applying it once after patch embedding. Moreover, I am assuimg if if_rope=True, then if_abs_pos_embed=False?
Edit: I understand now why the rotary positional encoding is applied at each layer.
I was also wondering about the same thing. Could you please provide some more details about why the rotary positional encoding is applied at each layer?
Hello,
Thank you for the amazing work!
I had a brief question, I do not quite understand why the rotary positional encoding is applied to the hideen states and residuals before each Mamba Layer, instead of applying it once after patch embedding. Moreover, I am assuimg if
if_rope=True
, thenif_abs_pos_embed=False
?Edit: I understand now why the rotary positional encoding is applied at each layer.
Thank you!