huggingface / parler-tts

Inference and training library for high-quality TTS models.
Apache License 2.0
4.34k stars 440 forks source link

How big a dataset is needed to train the model? #83

Open zyy-fc opened 3 months ago

zyy-fc commented 3 months ago

I used 560+ hours of libritts_R data to train the model (187M) from scratch, but the audio synthesized by the model is not correct.

Is this because the size od the dataset is not enough?

ScottishFold007 commented 2 months ago

I used 560+ hours of libritts_R data to train the model (187M) from scratch, but the audio synthesized by the model is not correct.

Is this because the size od the dataset is not enough?

People have tens of thousands of hours of training data, but you have less than 600 hours of audio data and you want to produce excellent results? That's just nonsense, isn't it?

gantuo commented 2 weeks ago

I used 560+ hours of libritts_R data to train the model (187M) from scratch, but the audio synthesized by the model is not correct. Is this because the size od the dataset is not enough?

People have tens of thousands of hours of training data, but you have less than 600 hours of audio data and you want to produce excellent results? That's just nonsense, isn't it?

bro, have you ever trained the model from scratch? Could you please tell me the final train loss and eval loss? I have trained on a 600 hours dataset and got a loss at 4.1, of course the model can't be able to produce any useful speech... Thanks very much.