yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.38k stars 340 forks source link

Finetune on ljspeech or libritts? #224

Closed Weroxig closed 3 months ago

Weroxig commented 3 months ago

StyleTTS 2 surpasses human recordings on the single-speaker LJSpeech dataset and matches it on the multispeaker VCTK dataset as judged by native English speakers. Moreover, when trained on the LibriTTS dataset, our model outperforms previous publicly available models for zero-shot speaker adaptation

The abstract seems to suggest that the ljspeech model is better. should I finetune using the ljspeech or libritts model?

Weroxig commented 3 months ago

moving to discussion