Open vidavakil opened 3 months ago
Hello again,
I notice you still have the original version of Mamba in your repo. I would be more than happy to send a pull request, with the above updates for supporting multi-head. I also have more recent changes for the original Mamba that support chunking, and handling of states and their gradients through chunks (and through the scan), that I can include in the pull request. Please let me know.
Otherwise, I can close this issue.
Best.
Hello,
First, congratulations on release of Mamba 2.0.
I wanted to let you know that I have published a fork of Mamba 1.0 that much like Mamba 2.0 happens to add support for multi-head SSMs, associated block-diagonal matrices, scalar gating, pre-convolution V (values), and multi-head output projections. Between the time I started working on that fork, and the time I was done, Mamba 2.0 was released, and I had missed it.
Alternative implementation of multi-head Mamba
Tweaking and experimenting with Mamba's code, with its sophisticated CUDA kernels, has been a wonderful learning experience, and I wanted to thank the authors once again.