Open OliverHxh opened 9 months ago
Your work is outstanding, and I admire the efficiency achieved in your mamba implementation.
However, I’m concerned about its accessibility and broader adoption in comparison to transformer-based methods, which are predominantly implemented in PyTorch. The barrier stems from the fact that CUDA programming is not as widely familiar within the research community, whereas PyTorch is a standard platform that allows researchers to easily tweak and experiment with transformer models, thus contributing to their widespread influence.
Understanding that the use of CUDA programming in mamba is to optimize performance, I suggest that making the algorithm more approachable for modification could encourage a larger portion of the community to contribute to its research and development. Take, for instance, the discretization process in mamba’s current version, which is implemented in CUDA for efficiency gains. This could alternatively be executed with straightforward PyTorch functions, which would be more user-friendly. Extracting this portion and adapting it to a PyTorch-based solution would not only make mamba more accessible but also enable researchers to more conveniently explore different discretization strategies. Such an adjustment could significantly enhance the collaborative potential and innovation surrounding mamba.
Thanks!
I'm not affiliated, but take a look at https://github.com/johnma2006/mamba-minimal
@deroholic Thanks, but I want a version that implements the discretization in pytorch, but still keeps the remaining operation in Cuda programming.
Thanks for the suggestion, we welcome contributions!
@OliverHxh do you have an idea on how to do discretization in pytorch while remaining efficient?
Your work is outstanding, and I admire the efficiency achieved in your mamba implementation.
However, I’m concerned about its accessibility and broader adoption in comparison to transformer-based methods, which are predominantly implemented in PyTorch. The barrier stems from the fact that CUDA programming is not as widely familiar within the research community, whereas PyTorch is a standard platform that allows researchers to easily tweak and experiment with transformer models, thus contributing to their widespread influence.
Understanding that the use of CUDA programming in mamba is to optimize performance, I suggest that making the algorithm more approachable for modification could encourage a larger portion of the community to contribute to its research and development. Take, for instance, the discretization process in mamba’s current version, which is implemented in CUDA for efficiency gains. This could alternatively be executed with straightforward PyTorch functions, which would be more user-friendly. Extracting this portion and adapting it to a PyTorch-based solution would not only make mamba more accessible but also enable researchers to more conveniently explore different discretization strategies. Such an adjustment could significantly enhance the collaborative potential and innovation surrounding mamba.
I know that you implement a
selective_scan_ref
function in Pytorch, but it is much slower. I guess it would be better to achieve a good trade-off between readability and efficiency.Thanks!