Closed gachaun closed 9 months ago
It is a different model. The demo was generated using models trained on g2p_en library but the checkpoint was trained using phonemizer. If you need the old model please let me know. I think the major reason was the data generated using phonemizer was bugged: https://github.com/yl4579/StyleTTS2/issues/108
I want to try fine-tuning my own data based on a pre-trained model, can you send me a link to the old model?
@gachaun The LJSpeech model is not suitable for finetuning because it only has one speaker. If you want to finetune, you will have to use the LibriTTS model, which has the same quality as the demo page.
Ok, I see. Thanks
See https://github.com/yl4579/StyleTTS2 which has better quality and a finetuning script.
I use the pre-trained model of LJSpeech and the emotional reference audio on the https://styletts.github.io/ web page. The sound quality of the generated audio is worse than the result of the web page. Why is this?