state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.45k stars 1.05k forks source link

How is the SSD framework connected to the Mamba 2 architecture? #447

Open roxanneSJ opened 2 months ago

roxanneSJ commented 2 months ago

I didn't quite understand the connection between the SSD framework and the Mamba 2 architecture. Reading the article, I got the impression that the SSM block in Mamba 2, differently from Mamba 1, is a SSD, which is a kind of SSM analogous to structured masked attention. Is that really the case? If not, what is the connection between the SSD framework and Mamba 2? Is the SSM block in Mamba 1 also a SSD and you only wanted to show the connection to attention in the "Transformers are SSMs" article? Or is SSD a separate layer from the SSM and, in this case, how are they connected in Mamba 2? Summing it up, it would really help if you could clarify how exactly the SSD framework is connected to Mamba 2. Thanks in advance :)

arelkeselbri commented 2 months ago

Me too wandering that. It's seems that Mamba-2 is using a very structured A matrix in which there is only one constant to each head. Is that correct?