goombalab / hydra

Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
87 stars 5 forks source link

Memory-efficient implementation #5

Closed alstonlo closed 1 month ago

alstonlo commented 1 month ago

Hi authors,

Thanks for all the amazing work!

I was wondering if there were any plans to support a memory-efficient implementation of Hydra, similar to the mem_eff path in Mamba2. As a workaround, I have written a preliminary implementation by extending the mamba_split_conv1d_scan_combined() function used in Mamba2. I would be excited to contribute it through a draft PR, if it would be helpful!

sukjunhwang commented 1 month ago

We have internally worked on it, but not releasing the code as we haven't extensively tested it. We'd love the community's contributions! As a side note, conv1d for Hydra is not causal, but bidirectional, which is different from Mamba :)

alstonlo commented 1 month ago

Thanks, I have created a PR!

sukjunhwang commented 1 month ago

Awesome, thank you for the PR!

alstonlo commented 1 month ago

Thanks for merging!

sukjunhwang commented 1 month ago

Thank you! sorry for taking a while for the merge