ibab / tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper
MIT License
5.41k stars 1.29k forks source link

Text-to-speech #276

Open aelbialy-tbox opened 7 years ago

aelbialy-tbox commented 7 years ago

Is there a way right now to have my model generated voice to say a specific word or sentence that I give it? If not, how can I implement this? Or if there any resources/repos that can help in doing that?

greigs commented 7 years ago

try tacotron

lemonzi commented 7 years ago

Not with the current implementation, this has has been discussed in the past (see #252). WaveNet is not the best for "raw" text-to-speech anyway (tacotron is indeed better), as it requires a lot of auxiliary components (the speech frontend) to make it work. If you want to have a look at how a full tts pipeline looks like, try Merlin. WaveNet is still great for other tasks, though (as a music encoder, as a time series model for other data, as a "decoder" for audio spectra...)

tibbon commented 6 years ago

Is that still the state of it? I know there's been a lot of Wavenet changes in the past month or so, and Google is implementing it for their voice assistant.

burakipekk commented 6 years ago

If we can't give words or sentences to generate sound, when we run the generate.py , what is happening? What is the content of this voice?

Kungergely commented 6 years ago

Most probably garbled speech (resembling some foreign language, but not forming any meaningful words in any language) similar to the samples at https://deepmind.com/blog/wavenet-generative-model-raw-audio/ in chapter "knowing what to say".

neil-119 commented 6 years ago

I'd like to know as well, especially after that Google I/O conference.