state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.54k stars 1.05k forks source link

Understanding about the selective scan #532

Open Aristo23333 opened 3 weeks ago

Aristo23333 commented 3 weeks ago

Hi author, in my openion, the selective scan play a very important part in mamba's ability to deal with the long sequence. But I cannot clearly understand how the "select" is implemented? By a random cut of the whole sequence or a kind of select strategy? I notice that in mamba2 you mention that the multiplication of A matrixes bring some extent of the selective ability, is that the "selective mechanism" you think really contains in mamba? Thank you!

Hprairie commented 3 weeks ago

I'm not an author, but the selective mechanism of mamba is that $B, C, \Delta$ are functions of $x$ rather than static weights. Thus a function $f$ can be learned which essentially only allows important information into the state, and can be thought of as selective. There is a deeper connection to gating with RNNs, which I would read in the original Mamba paper, as it paints a very intuitive idea of selectivity.

Aristo23333 commented 3 weeks ago

I'm not an author, but the selective mechanism of mamba is that B , C , Δ are functions of x rather than static weights. Thus a function f can be learned which essentially only allows important information into the state, and can be thought of as selective. There is a deeper connection to gating with RNNs, which I would read in the original Mamba paper, as it paints a very intuitive idea of selectivity.

Thank you for point out this I will research more!