skirdey / voicerestore

VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration
MIT License
88 stars 9 forks source link

Fine-tuning Colab Guide #8

Open yukiarimo opened 3 weeks ago

yukiarimo commented 3 weeks ago

Hello, I’m super excited about this model. Do you have a step-by-step tutorial on how to fine-tune this model on my custom input-output audio pairs?

skirdey commented 3 weeks ago

Appreciate your interest! Let me create a colab example, will share it here

yukiarimo commented 3 weeks ago

Sure! Ping me when you are done!

skirdey commented 1 week ago

Not exactly colab notebook, but this is code https://gist.github.com/skirdey/4c90202ee4aa753a0184f4366953b60a that was used for training, you can copy paste and with minor tweaks it should work.

yukiarimo commented 1 week ago

Oh, my! Thank you so much.

Could you please explain how to use it and what my dataset should look like? Also, can I begin with your model instead of starting from scratch?

And, is 1xA100 is sufficient enough, and how long would it take to train?

skirdey commented 1 week ago

1xA100 should be more than enough! And for the code it goes like this train.py train_example.py (you run it as accelerate launch train_example.py) forward function goes into model next to the sample function

if you have these files and try accelerate launch, it should start training from scratch. For your own dataset you can provide just clean audio and use built in degradations, or come up with your own degradations.

For fine-tuning you can use the same checkpoint that was shared for inference, it should work.