issues
search
jxiw
/
MambaInLlama
Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
https://arxiv.org/abs/2408.15237
Apache License 2.0
127
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Training Slowdown for Llama3-Mamba2
#9
Codys12
opened
2 days ago
11
Mamba Model initialisation
#8
aashay-sarvam
closed
4 days ago
1
State size vs hidden size, finding N'
#7
Codys12
opened
4 days ago
4
Could you share your code to generate "pseudo labels"
#6
tianshu-zhu
closed
4 days ago
2
Shape Mismatch & mamba-ssm==2.1.0 not working
#5
Mooler0410
closed
1 week ago
10
Some notes on improving HF integration
#4
NielsRogge
closed
1 week ago
2
Request for Implementation Guidance on Hardware-Aware Speculative Decoding in Mamba Models
#3
adityakotha03
opened
2 weeks ago
2
could you share the version of accelerate and deepspeed?
#2
hanlinxuy
closed
1 week ago
1
Quick Question: Have you ever tried distill the instruction-tuned models? If so, will the instruction following capabilities remain?
#1
Mooler0410
opened
1 month ago
2