yl4579 / StyleTTS

Official Implementation of StyleTTS
MIT License
387 stars 62 forks source link

Pre-training model sound quality issues #64

Closed gachaun closed 9 months ago

gachaun commented 9 months ago

I use the pre-trained model of LJSpeech and the emotional reference audio on the https://styletts.github.io/ web page. The sound quality of the generated audio is worse than the result of the web page. Why is this?

yl4579 commented 9 months ago

It is a different model. The demo was generated using models trained on g2p_en library but the checkpoint was trained using phonemizer. If you need the old model please let me know. I think the major reason was the data generated using phonemizer was bugged: https://github.com/yl4579/StyleTTS2/issues/108

gachaun commented 9 months ago

I want to try fine-tuning my own data based on a pre-trained model, can you send me a link to the old model?

yl4579 commented 9 months ago

@gachaun The LJSpeech model is not suitable for finetuning because it only has one speaker. If you want to finetune, you will have to use the LibriTTS model, which has the same quality as the demo page.

gachaun commented 9 months ago

Ok, I see. Thanks

yl4579 commented 9 months ago

See https://github.com/yl4579/StyleTTS2 which has better quality and a finetuning script.