state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.44k stars 1.05k forks source link

MuTransfer for Mamba #379

Open norikazu99 opened 2 months ago

norikazu99 commented 2 months ago

Hello, I have been training models with mamba (v1) and I'm enjoying it. I would like to use MuTransfer for Mamba. Should I just scale the width params (matrices dim and conv dim) or are there other constants that need to be scaled like in transformers attention_scores?

tridao commented 2 months ago

We don't have much experience with MuTransfer