Closed chazo1994 closed 4 years ago
For average length sentences, the inference time of Glow-TTS on a V100 GPU is about 40ms, whereas the inference time of FastSpeech was reported as ~25ms. You can find out in the papers (FastSpeech: Table 2, Glow-TTS: Section 5.2. Sampling Speed).
Glow-TTS is slower than FastSpeech, but I don't think the difference is significant in end-to-end setting, because vocoders are usually way slower than parallel TTS models.
@jaywalnut310 Do you try to inference on CPU, and compare with GPU?
I tried to measure inference on CPU, but I don't remember accurately. So, I cannot tell in certain, but I thought it was about 500ms to generate average length sentences on CPU.
You can try it with the pretrained model and inference notebook. Closing the issue.
How would one configure the inference notebook to be CPU compatible?
I modified the script by adding the following after the import statements:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Then I replaced every call to cuda()
with to(device)
. The script is able to nearly finish, but fails on the second-to-last line of code.
When it tries to execute:
audio = waveglow.infer(y_gen_tst, sigma=.666)
it raises
RuntimeError: Found no NVIDIA driver on your system.
Did you compare your proposal method with others parallel TTS like Fastspeech, how about your latency compared with that models.