issues
search
goombalab
/
phi-mamba
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models)
https://arxiv.org/abs/2408.10189
68
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
input to student layer is the hidden from the previous one?
#4
tGhattas
closed
1 day ago
1
Very high loss for stage 1
#3
ostix360
opened
3 days ago
5
Batch size
#2
tGhattas
opened
4 days ago
1
Great paper! can you please release the training source code please?
#1
tGhattas
closed
1 week ago
5