goombalab / phi-mamba

Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models)
https://arxiv.org/abs/2408.10189
68 stars 3 forks source link