Open digitalfiz opened 4 years ago
@maxbachmann is working on this for PicoTTS, I thought, maybe you two can align your ideas?
Depending on the TTS engine some use the speak tag and some do not even support it (probably for simplicity reasons). However I would like to stay as close as possible to the actual standard https://www.w3.org/TR/speech-synthesis/#edef_speak, so I think requiring the tag makes sence for us. Especially since it allows to differentiate between SSML and non SSML input (thats relevant when people use characters like < in their text that need to be escaped in SSML).
I think the approach to check whether the speak tag is used and then send it to the engine accordingly should work fine, since google already filters out unsupported tag anyways + most SSML Tags should be supported by google. In the future this might need to be extended, since e.g. the audio tag for google obviously requires a url to get the audio from, while a user might simply want to inject some local audio.
With pico tts the big difference is that it does not support ssml but only a different meta language with a lot less tags, so tags like say-as need to be handled before even sending it to the tts.
@digitalfiz Do you want to create the Pull Request. The code shown above should already be sufficient (only the startsWith
call should be startswith
instead)
Yeah I'll have a PR in a bit.
Sorry took me a few extra days but I put the PR up.
thanks
I would like to create this issue to request this feature. I will probably be working on a PR for this but I would like to document the request in case I can't.
Would be sweet if it could look for
<speak>
at the beginning of the string to know its going to be ssml and send the request to wavenet accordingly.Something like
On this line: https://github.com/rhasspy/rhasspy-tts-wavenet-hermes/blob/master/rhasspytts_wavenet_hermes/__init__.py#L110