metavoiceio / metavoice-src

Foundational model for human-like, expressive TTS
https://themetavoice.xyz/
Apache License 2.0
3.7k stars 642 forks source link

Fine-tuning voice-cloning capability of metavoice #137

Open abhijeethp opened 4 months ago

abhijeethp commented 4 months ago

Hey Team, Can anyone help me understand the following regarding the metavoice model fine-tuning process? https://github.com/metavoiceio/metavoice-src/tree/main?tab=readme-ov-file#finetuning

Arman12345677 commented 4 months ago

Old man voice

lucapericlp commented 3 months ago

Hey @abhijeethp, sorry for only getting to this now, we've seen people finetuning using chunks of 5-10s audio in their training datasets (but it's not a hard range). We're not calculating SiSNR as part of finetuning - are you asking whether using the same audio is appropriate?

Re finetuning the voice cloning, you should be all good if you follow the finetuning guide with a solid dataset & play around with the hyperparameters and then use a good reference clip upon inference.