Open FIlipHand opened 5 months ago
I'm not familiar with the implementation in HF. In any case it's just some model code so I think training from scratch should work.
Hi @FIlipHand,
I'm interested in training the Mamba model from scratch using Hugging Face transformers. Have you had success with this approach?
Thanks!
Hello @yangzhao1230
Yes, I managed to train a mamba model with huggigface. It is actually quite easy. You can take a look at #294 where someone wrote a sample code that implemented training from a checkpoint. The only difference is that instead of passing model checkpoint as string you have to pass MambaConfig
object with your parameters. The rest is the same. I personally haven't been able to scale it properly on my hardware and train it for LM purpose, but I have friends who used smaller mamba models for purposes like phonemizer, and it was successful.
Hi everyone, I want to create new mamba model to use in my project for generating new sequences. I have my own dataset based on MIDI files so I'm not keen on using pretrained model and I want to play with number of parameters. For ease of coding, I wanted to use Hugging Face transformers as I saw that it is possible to fine tune model from checkpoint. Is it possible to do it from scratch? I noticed that
MambaForCausalLM
inherits fromMambaPreTrainedModel
which inherits fromPreTrainedModel
in transformers code, and everyone was just fine tuning the model and loading usingfrom_pretrained
method and never training it from scratch.Thanks for answers!