Passing an initial_conv_state in mamba_split_conv1d_scan_combined?

state-spaces / mamba

Mamba SSM architecture

Apache License 2.0

12.6k stars 1.06k forks source link

Passing an initial_conv_state in mamba_split_conv1d_scan_combined? #460

Open h-zhao1997 opened 2 months ago

h-zhao1997 commented 2 months ago

Thank you for your outstanding work! I'm curious if you've thought about including an additional parameter in the mamba_split_conv1d_scan_combined function to accept an initial_conv_state. This could open up some intriguing applications, like treating initial_conv_state as a trainable parameter. I've observed that Mamba2 has already implemented the ability to pass initial_states for the SSM layer. In your opinion, would it be beneficial to adopt a similar strategy for the 1D convolution layer?

tridao commented 2 months ago

Yes this is a good idea. The conv1d implementatation actually already supports taking in intial states and returning final states. We just haven't had time to wired everything together.

h-zhao1997 commented 2 months ago

@tridao Thank you very much for your reply! I'm really looking forward to seeing this feature integrated!