Closed josephrocca closed 1 year ago
Woah 🤯 This definitely sounds do-able! I'll look into it (and hopefully add it quite soon 💪 )
I've been looking into it more today, but it seems as though HF does not support text-to-speech
in the pipeline function? It also appears that optimum doesn't support text-to-speech
as a task (needed to convert to ONNX format).
Fortunately, the spaces demo you sent above includes some code I can use for testing (https://huggingface.co/spaces/Matthijs/speecht5-tts-demo/blob/main/app.py).
If you want, you could even open up a feature request / PR on the main transformers branch to add this.
Just came across this:
The audio quality here seems quite good for the model size.
@josephrocca Do you have any favorite model or do you use different models for different tasks?
I am very much looking forward to this :see_no_evil:
I guess the decision matrix would contain:
Just tested Bark: https://github.com/huggingface/transformers/issues/23036
I like the ability to add emotion, just funny that it suddenly changed the voice/gender too :sweat_smile:
Yeah expressiveness/non-roboticness is the main factor for me. And next is inference speed. Size is probably not a big issue for my use cases - anything under 500mb is fine.
Name of the feature Speech to text using SpeechT5, which was recently added to Transformers.
Reason for request The brower's default TTS API is quite bad if you want to create an experience that works nicely across all browsers. Firefox's voices in particular are extremely robotic. Some applications require that the voice is consistent, and of a particular style/tone/etc. SpeechT5 allows you to create 512-dim speaker embeddings so you can use an arbitrary voice style.
Additional context
Example clip from the Spaces demo (this embedding is pretty monotone):
tmptgsysvc8.webm