jaywalnut310 / glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search
MIT License
660 stars 151 forks source link

Sharing Korean Glow-TTS Samples #27

Open Joovvhan opened 4 years ago

Joovvhan commented 4 years ago

Dear contributors,

Thank you for sharing your great works.

I have successfully reproduced your result with the LJSpeech Dataset.

In addition, I have trained your model with Korean Single Speaker Speech Dataset and G2PK grapheme-to-phoneme converting module as a Korean cleaner.

This is the link to the demo page.

I would be glad if you introduce my demo page in your README.

Thanks again for your great code.

dathudeptrai commented 4 years ago

@Joovvhan hi, i also try to train Korean KSS dataset and make a colab here (https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). This is fastspeech2 + mb-melgan. I'm not a native speaker, can you compared glow-tts + waveglow with our fastspeech2 + mb-melgans ?. Thanks

Joovvhan commented 4 years ago

@dathudeptrai Yes, I would be glad to. As I looked through your samples quickly, your pre-trained model on the Colab is quite impressive.

Yet, since I am not the original writer of the Glow-TTS, I have not tuned any hyperparameters or introduced any audio processing techniques to improve the audio quality.

I guess it would make more sense to apply the same techniques that have been used in training or synthesizing your samples (fastspeech2 + mb-melgans) to the Glow-TTS model first, then compare samples.

In addition, it seems that your pretrained mb-melgans is better than officially provided universal WaveGlow model in generating the Korean speech. I found that WaveGlow produces some screeching sound when regenerating audio from ground truth spectrograms. I have not found any in your samples yet.

I will share you the link to the audio comparison page when I am done.

It would be better to write comments on your issue page if we have any further discussions.

Thanks.

Joovvhan commented 4 years ago

Dear authors,

I have improved my demo page by replacing the WaveGlow vocoder with a Multi-MelGAN vocoder provided by TensorFlowTTS authors.

I found out that the official universal WaveGlow vocoder is not so universal for the Korean language.

This is the link to the webpage.

I will leave the poor sample page unchanged for someone who would like to compare the effect of the vocoder.

Joovvhan commented 4 years ago

Dear contributors,

I have applied G2PK, grapheme to phoneme conversion package, and achieved an improved Korean TTS results.

This is the link to the demo page.

Since the original paper used phoneme tokens as inputs, I believe this result is closer to the intention of your original work.

Thanks.

v-nhandt21 commented 3 years ago

Dear contributors,

I have applied G2PK, grapheme to phoneme conversion package, and achieved an improved Korean TTS results.

This is the link to the demo page.

Since the original paper used phoneme tokens as inputs, I believe this result is closer to the intention of your original work.

Thanks.

Hi Joovvhan, I found your demo is much good, but about G2PK in your language, how can you handle with out of vocab words such as English or name of unknown place?