The new, improved WaveNet model still generates a raw waveform but at speeds 1,000 times faster than the original model, meaning it requires just 50 milliseconds to create one second of speech. In fact, the model is not just quicker, but also higher-fidelity, capable of creating waveforms with 24,000 samples a second. We have also increased the resolution of each sample from 8 bits to 16 bits, the same resolution used in compact discs.
It is also now capable of running at scale and is the first product to launch on Google’s latest TPU cloud infrastructure.
https://deepmind.com/blog/wavenet-launches-google-assistant/
The new, improved WaveNet model still generates a raw waveform but at speeds 1,000 times faster than the original model, meaning it requires just 50 milliseconds to create one second of speech. In fact, the model is not just quicker, but also higher-fidelity, capable of creating waveforms with 24,000 samples a second. We have also increased the resolution of each sample from 8 bits to 16 bits, the same resolution used in compact discs.
It is also now capable of running at scale and is the first product to launch on Google’s latest TPU cloud infrastructure.