mozilla / TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Mozilla Public License 2.0
9.44k stars 1.26k forks source link

Vocoder comparison #219

Closed ghost closed 5 years ago

ghost commented 5 years ago

Hi, just wanted to know what's the state of the different vocoders. I have several questions:

  1. From my tests, I've seen LWS is 10x faster than GL. Is this correct?

  2. To implement LWS would it be enough to calculate the mel and specs using LWS and use it also to invert them?

  3. How would GL and LWS compare with WaveRNN in quality and inference time?

  4. Is there any update on the WORLD vocoder?

Thanks a lot.

erogol commented 5 years ago
  1. LWS has some parameter constraints like window size and hop-size for a given sample rate.
  2. LWS works with the same inputs as GL
  3. Bad in quality, much faster in run-time.
  4. Alas, no.

Good luck.

mrgloom commented 5 years ago

https://github.com/r9y9/deepvoice3_pytorch uses LWS and (maybe subjective) but I think it have better quality then GL.

Also here they use WORLD and their demos have good quality, but I can't find any repo that reproduce it https://mtg.github.io/singing-synthesis-demos/

mrgloom commented 5 years ago

Also here is my test for WORLD vocoder, i.e. what it will sound like if smoothed features will be predicted. https://github.com/mozilla/TTS/issues/9#issuecomment-497303645