jxiw MambaInLlama issues - Githubissues

jxiw / MambaInLlama

Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models

https://arxiv.org/abs/2408.15237

Apache License 2.0

127 stars 8 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Training Slowdown for Llama3-Mamba2

#9 Codys12 opened 2 days ago
11
Mamba Model initialisation

#8 aashay-sarvam closed 4 days ago
1
State size vs hidden size, finding N'

#7 Codys12 opened 4 days ago
4
Could you share your code to generate "pseudo labels"

#6 tianshu-zhu closed 4 days ago
2
Shape Mismatch & mamba-ssm==2.1.0 not working

#5 Mooler0410 closed 1 week ago
10
Some notes on improving HF integration

#4 NielsRogge closed 1 week ago
2
Request for Implementation Guidance on Hardware-Aware Speculative Decoding in Mamba Models

#3 adityakotha03 opened 2 weeks ago
2
could you share the version of accelerate and deepspeed?

#2 hanlinxuy closed 1 week ago
1
Quick Question: Have you ever tried distill the instruction-tuned models? If so, will the instruction following capabilities remain?

#1 Mooler0410 opened 1 month ago
2