I didn't quite understand the connection between the SSD framework and the Mamba 2 architecture. Reading the article, I got the impression that the SSM block in Mamba 2, differently from Mamba 1, is a SSD, which is a kind of SSM analogous to structured masked attention. Is that really the case? If not, what is the connection between the SSD framework and Mamba 2? Is the SSM block in Mamba 1 also a SSD and you only wanted to show the connection to attention in the "Transformers are SSMs" article? Or is SSD a separate layer from the SSM and, in this case, how are they connected in Mamba 2? Summing it up, it would really help if you could clarify how exactly the SSD framework is connected to Mamba 2. Thanks in advance :)
I didn't quite understand the connection between the SSD framework and the Mamba 2 architecture. Reading the article, I got the impression that the SSM block in Mamba 2, differently from Mamba 1, is a SSD, which is a kind of SSM analogous to structured masked attention. Is that really the case? If not, what is the connection between the SSD framework and Mamba 2? Is the SSM block in Mamba 1 also a SSD and you only wanted to show the connection to attention in the "Transformers are SSMs" article? Or is SSD a separate layer from the SSM and, in this case, how are they connected in Mamba 2? Summing it up, it would really help if you could clarify how exactly the SSD framework is connected to Mamba 2. Thanks in advance :)