issues
search
goombalab
/
phi-mamba
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models)
https://arxiv.org/abs/2408.10189
77
stars
4
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Missing Stage 3 Code for Weight Transfer and Knowledge Distillation
#8
L-z-Chen
opened
18 hours ago
0
Clarification on the training schedule used for the final model
#7
juankost
closed
1 week ago
2
Transformers library version? error in evaluation
#6
juankost
closed
1 week ago
1
Hybrid Phi-Mamba model weights
#5
juankost
closed
1 week ago
2
input to student layer is the hidden from the previous one?
#4
tGhattas
closed
1 month ago
1
Very high loss for stage 1
#3
ostix360
closed
1 month ago
5
Batch size
#2
tGhattas
closed
1 day ago
3
Great paper! can you please release the training source code please?
#1
tGhattas
closed
1 month ago
5