FastSpeech2 trained using LibriTTS dataset

ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

MIT License

1.84k stars 537 forks source link

FastSpeech2 trained using LibriTTS dataset #63

Open LEEYOONHYUNG opened 3 years ago

LEEYOONHYUNG commented 3 years ago

Hi, my name is Yoonhyung Lee, who is studying Text-to-Speech. Thank you for your nice implementation of FastSpeech2. It helped me a lot to study it, but a question occurred to me.

According to the README.md, it seems that you have trained FastSpeech2 using LibriTTS dataset, but I cannot see the audio samples. Did you use all of the 585hours dataset for the training? How well the FastSpeech work on multispeaker dataset?

ming024 commented 3 years ago

@LEEYOONHYUNG Oh I just forgot to post the audio samples. I'll update the demo page some other day. Honestly speaking the quality of the synthesized LibriTTS samples is not as good as the result of the single speaker dataset. I guess it is because that the environment noises in the LibriTTS dataset are much severe than the LJSpeech dataset. It might be a good idea to apply some data cleaning tricks before training the TTS model.

LEEYOONHYUNG commented 3 years ago

I think it is quite natural learning multi-speaker TTS is more difficult. Thank you for your reply :D

lkurlandski commented 3 years ago

@ming024 any chance you could post a pretrained model for the multi-speaker English dataset, LibriTTS?

Great work with this repo and thanks in advacne!