Open maedoc opened 8 months ago
"based" is an apparently simpler approach with similar performance one could use https://www.together.ai/blog/based combining
Bitnet and handful of other tech mentioned in https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html could be useful for the larger deep models.
Adding some of the newer SSM based mechanisms would be of interest, and the following dérivation would facilitate implementation.
https://srush.github.io/annotated-mamba/hard.html