srush / annotated-mamba

Annotated version of the Mamba paper
MIT License
444 stars 17 forks source link

associative ssm op order #5

Open kpich opened 5 months ago

kpich commented 5 months ago

i might be wrong, but in the colab/blog, i think the $\oplus$ op used to do the associative scan for the selective state space model should have, as the value of its first output, $a_2 a_1$ (rather than $a_1 a_2$), reflecting the fact that the leftmost $A$ transform gets applied first.

(Since the Mamba matrices are diagonal and therefore commutative it doesn't actually matter here I guess, I just found this initially confusing in the presentation).

It looks to be correct in the triton first_order_op but is I think reversed in the reference pytorch op and latex above it.

Thanks for this terrific writeup! It really clarified some things for me, thanks

srush commented 5 months ago

Oh that sounds right. I will fix in my next version!