mozilla / TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Mozilla Public License 2.0
9.34k stars 1.25k forks source link

Inference time metrics? #8

Closed claytonblythe closed 6 years ago

claytonblythe commented 6 years ago

Hello, I am very interested in this project, I am looking for a pytorch implementation of Tacotron/Tacotron2/WaveNet and may wish to contribute. Do you have any metrics on forward-pass time for inference on new text? I am looking to export a PyTorch model into Caffe2 and run it on a mobile platform.

erogol commented 6 years ago

Hi @claytonblythe. Always welcome any contributors.

I do not have exact measures for inference yet. I am still working on the training part. I share once I have any.

The caveat of using Onnx to convert things to Caffe2 (if it is your plan) is it does not support RNN layers yet. So we have a plan to change the network architecture and make it RNN free for the sake of better mobilization. However, not in progress yet.

claytonblythe commented 6 years ago

That is good to know, I was not aware. I probably won't be at that stage for a while, but making it into an API with Flask or something similar would be awesome as well once you get to that point (longer term).

erogol commented 6 years ago

To give a small update, the following code takes 5.02 secs on a GPU already training another model. On CPU 15.08 secs, however again note that the computer is under very heavy load right-now.

The generated audio is also 5.1 secs long. So it is almost real-time on a GPU and probably the margin is safer on a single free GPU.

sentence = "The human voice is the most perfect instrument of all."
alignment = tts(model, sentence, CONFIG, use_cuda, ap)