tag = "kan-bayashi/jsut_fastspeech2" trivial problem

kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

https://kan-bayashi.github.io/ParallelWaveGAN/

MIT License

1.57k stars 343 forks source link

tag = "kan-bayashi/jsut_fastspeech2" trivial problem #228

Closed ycat3 closed 4 years ago

ycat3 commented 4 years ago

In espnet2_tts_demo.ipynb google colab and in my local notebook shows strange problem. When I input the following sentence, "この谿谷の、最も深いところには、木曽福島の関所も、隠れていた。" last character is missing, no sound.

tag = "kan-bayashi/jsut_fastspeech2" vocoder_tag = "jsut_multi_band_melgan.v2"

Other tags including "kan-bayashi/jsut_conformer_fastspeech2" does not show the problem.

kan-bayashi commented 4 years ago

Sometimes, duration predictor failed to predict durations. Maybe that is the reason of the failure. By introducing the conformer, the convolution module can capture the local context better than the standard transformer, and therefore, the conformer-fastpseech2 is more robust to the input text content.

kan-bayashi commented 4 years ago

I also confirmed the above behavior with just_fastspeech2. Interestingly,

この谿谷の、最も深いところには、木曽福島の関所も、隠れていた is failed
この谿谷の、最も深いところには、福島の関所も、隠れていた is OK
この木曽の、最も深いところには、木曽福島の関所も、隠れていた is OK

The Transformer behavior is mysterious :(

kan-bayashi commented 4 years ago

If you have a further discussion, please re-open.