state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.7k stars 1.06k forks source link

Is the mixing of SSD layers with Attention supported in this codebase? #564

Closed MaximilienLeClei closed 2 weeks ago

MaximilienLeClei commented 2 weeks ago

Referring to section 9.2.3. in the Mamba-2 paper

tridao commented 2 weeks ago

Yes, see this config: https://huggingface.co/state-spaces/mamba2attn-2.7b/blob/main/config.json

MaximilienLeClei commented 2 weeks ago

Thanks!