state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
12.23k stars 1.03k forks source link

Training mamba model with Huggingface transformers from scratch #256

Open FIlipHand opened 5 months ago

FIlipHand commented 5 months ago

Hi everyone, I want to create new mamba model to use in my project for generating new sequences. I have my own dataset based on MIDI files so I'm not keen on using pretrained model and I want to play with number of parameters. For ease of coding, I wanted to use Hugging Face transformers as I saw that it is possible to fine tune model from checkpoint. Is it possible to do it from scratch? I noticed that MambaForCausalLM inherits from MambaPreTrainedModel which inherits from PreTrainedModel in transformers code, and everyone was just fine tuning the model and loading using from_pretrained method and never training it from scratch.

Thanks for answers!

tridao commented 5 months ago

I'm not familiar with the implementation in HF. In any case it's just some model code so I think training from scratch should work.

yangzhao1230 commented 4 months ago

Hi @FIlipHand,

I'm interested in training the Mamba model from scratch using Hugging Face transformers. Have you had success with this approach?

Thanks!

FIlipHand commented 3 months ago

Hello @yangzhao1230 Yes, I managed to train a mamba model with huggigface. It is actually quite easy. You can take a look at #294 where someone wrote a sample code that implemented training from a checkpoint. The only difference is that instead of passing model checkpoint as string you have to pass MambaConfig object with your parameters. The rest is the same. I personally haven't been able to scale it properly on my hardware and train it for LM purpose, but I have friends who used smaller mamba models for purposes like phonemizer, and it was successful.