SSML is the standard for adding features (silence, emphasis) to speech synthesis. While some backends do implement their own support (like Larynx and IBM) some don't support it (mostly the minimal, local ones like espeak and pico). Having SSML available at a TTS abstraction level would not only bring support to those engines, would also make it realiable (since now we can expect every backend to support SSML) but also we would be able to mix and match languages end even engines!!! (Yeah, you could have an argument with different TTSs).
Even if we just rely on the internals of the engines that have SSML support we'll need some way to have it, since this is pretty crucial for great interactions, and will be a must when VAP capabilities mature.
SSML is the standard for adding features (silence, emphasis) to speech synthesis. While some backends do implement their own support (like Larynx and IBM) some don't support it (mostly the minimal, local ones like espeak and pico). Having SSML available at a TTS abstraction level would not only bring support to those engines, would also make it realiable (since now we can expect every backend to support SSML) but also we would be able to mix and match languages end even engines!!! (Yeah, you could have an argument with different TTSs).
Even if we just rely on the internals of the engines that have SSML support we'll need some way to have it, since this is pretty crucial for great interactions, and will be a must when VAP capabilities mature.