state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.72k stars 1.07k forks source link

Flexible creating of parameters in Training? #426

Open Aristo23333 opened 3 months ago

Aristo23333 commented 3 months ago

Hi author, Thank you for your talented work! I notice that in mamba_simple.py image In the function mamba_inner_fn, The B,C can be input variables like A?\ Do this means I can make personalized input of B and C for training, like A? Does this affect the time-varient characteristics of Mamba that you mentioned in your paper?

Hprairie commented 3 months ago

In Mamba B and C are functions of x (the first part of the xz variable). To create a B and C, I'm guessing they use x_proj to get something like this B, C, delta = x_proj(x), which means that B, C, and delta are input dependent. If you want them to be static, i.e. not time-variant, then you can set them to parameters like in the S4 papers. However, the paper abalates that selectivity is important for language so it would be best to leave them as functions of x if the downstream task in language.

That being said, their cuda kernel does technically allow B and C to be static and not inputs of x, but again this has been shown to have worse downstream performance.

tridao commented 3 months ago

By default, B and C are the results of x_proj(x) (i.e. B and C are input-dependent), which is done inside mamba_inner_fn. If you pass sth not None to mamba_inner_fn, those would be used instead. It might work, but that code path is used extensively and as a result not as well-tested.