rhasspy / rhasspy-tts-wavenet-hermes

MQTT service for text to speech using Google's Wavenet and the Hermes protocol
MIT License
1 stars 2 forks source link

Support for SSML #3

Open digitalfiz opened 4 years ago

digitalfiz commented 4 years ago

I would like to create this issue to request this feature. I will probably be working on a PR for this but I would like to document the request in case I can't.

Would be sweet if it could look for <speak> at the beginning of the string to know its going to be ssml and send the request to wavenet accordingly.

Something like

if say.text.startsWith('<speak>'):
    synthesis_input = texttospeech.SynthesisInput(ssml=say.text)
else:
    synthesis_input = texttospeech.SynthesisInput(text=say.text)

On this line: https://github.com/rhasspy/rhasspy-tts-wavenet-hermes/blob/master/rhasspytts_wavenet_hermes/__init__.py#L110

koenvervloesem commented 4 years ago

@maxbachmann is working on this for PicoTTS, I thought, maybe you two can align your ideas?

maxbachmann commented 4 years ago

Depending on the TTS engine some use the speak tag and some do not even support it (probably for simplicity reasons). However I would like to stay as close as possible to the actual standard https://www.w3.org/TR/speech-synthesis/#edef_speak, so I think requiring the tag makes sence for us. Especially since it allows to differentiate between SSML and non SSML input (thats relevant when people use characters like < in their text that need to be escaped in SSML).

I think the approach to check whether the speak tag is used and then send it to the engine accordingly should work fine, since google already filters out unsupported tag anyways + most SSML Tags should be supported by google. In the future this might need to be extended, since e.g. the audio tag for google obviously requires a url to get the audio from, while a user might simply want to inject some local audio.

With pico tts the big difference is that it does not support ssml but only a different meta language with a lot less tags, so tags like say-as need to be handled before even sending it to the tts.

@digitalfiz Do you want to create the Pull Request. The code shown above should already be sufficient (only the startsWith call should be startswith instead)

digitalfiz commented 4 years ago

Yeah I'll have a PR in a bit.

digitalfiz commented 4 years ago

Sorry took me a few extra days but I put the PR up.

maxbachmann commented 4 years ago

thanks