state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.72k stars 1.07k forks source link

Alternative implementation of Multi-Head Mamba #372

Open vidavakil opened 3 months ago

vidavakil commented 3 months ago

Hello,

First, congratulations on release of Mamba 2.0.

I wanted to let you know that I have published a fork of Mamba 1.0 that much like Mamba 2.0 happens to add support for multi-head SSMs, associated block-diagonal matrices, scalar gating, pre-convolution V (values), and multi-head output projections. Between the time I started working on that fork, and the time I was done, Mamba 2.0 was released, and I had missed it.

Alternative implementation of multi-head Mamba

Tweaking and experimenting with Mamba's code, with its sophisticated CUDA kernels, has been a wonderful learning experience, and I wanted to thank the authors once again.

vidavakil commented 3 months ago

Hello again,

I notice you still have the original version of Mamba in your repo. I would be more than happy to send a pull request, with the above updates for supporting multi-head. I also have more recent changes for the original Mamba that support chunking, and handling of states and their gradients through chunks (and through the scan), that I can include in the pull request. Please let me know.

Otherwise, I can close this issue.

Best.