Voice models/training - Githubissues

myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

MIT License

3.98k stars 476 forks source link

Voice models/training #95

Open JuniorGamingTime opened 2 months ago

JuniorGamingTime commented 2 months ago

So I have MeloTTS running on a docker faster than real time (GPU) and a FastAPI wrapper. But was wondering are there any other pre train models to use? I want to be able to change the voice to a male.

Also for training my own model how much data do I need? And when getting the training data should only include the voice of how I want the end model to sound like?

Never done anything like this, so I bit lost!

louistiti commented 1 month ago

Hi, I needed a male voice for Leon, so now I'm training a model.

Basically you can follow the steps described here.

I got 1000 audio samples in total (one or two sentences for each audio sample, with a length of less than 10 seconds). With a batch size of 8 and an epoch iteration number of 740 when I'm writing this comment, you can see the result via this link.

Here is how it sounds so far.

Swastik-Mantry commented 1 month ago

Hey @louistiti, Could you please share the time taken / ETA for fine-tuning/training the model along with your hardware details ? Fine-tuning is going on for couple of days, can't understand whats the issue. Thanks in advance.

louistiti commented 1 month ago

Hey @louistiti, Could you please share the time taken / ETA for fine-tuning/training the model along with your hardware details ? Fine-tuning is going on for couple of days, can't understand whats the issue. Thanks in advance.

Sure! I got a decent quality after 36 hours of training already. It took ~3.5 days for the training to be complete.

s-tweed commented 3 weeks ago

Sounds pretty impressive! What is your GPU setup? I want to fine-tune a custom voice but 3.5 days sounds like quite a while...

louistiti commented 3 weeks ago

Sounds pretty impressive! What is your GPU setup? I want to fine-tune a custom voice but 3.5 days sounds like quite a while...

I have an RTX 3090.

s-tweed commented 3 weeks ago

Thanks for the info, gives me a point of reference. I really like the voice you trained, it's quite fun to hear. I hope to see your project continue to advance!

FabioAML commented 2 weeks ago

Hi @louistiti , have you modified the training.py ? for me it keeps looping after finishing the epochs.

louistiti commented 2 weeks ago

Hi @louistiti , have you modified the training.py ? for me it keeps looping after finishing the epochs.

I only modified the config file to set the batch size to 8 according to my hardware. I also noticed that once all iterations are over it'll loop indefinitely so I just stop it manually 😅

FabioAML commented 2 weeks ago

hahaha thanks. I was so desperate trying to find the loop that didn't think for a second about stopping manually.