Closed cconcolato closed 1 year ago
I've been wondering about adding text to speech directives too. The current two attributes, tta:speak
and tta:pitch
already duplicate parts of SSML, so if we allow direct inclusion of SSML we would end up with a mixed model, which seems non-ideal to me. On the other hand, just reproducing each part of SSML that we think might be useful, as additional TTML2 vocabulary, does not seem like a good idea either.
Can we envisage embedding a profile of SSML instead?
I would not like to create a normative dependency that means implementations must support some feature set of SSML, but I agree that we should specify the model for:
tta:
attributes into SSML - I think it maps into the prosody
element, from memory.If we go with allowing both SSML syntax and tta:
syntax, we should mandate that they be equivalent and if not indicate which one has precedence.
Note to self: there's another W3C spec that specifies how to inject SSML into an attribute - consider if that approach could work here.
It's https://www.w3.org/TR/spoken-html/ and is a working draft right now.
Key question for us here: exactly how should more advanced SSML be embedded syntactically into the DAPT Script?
Here is an real-world example of SSML:
<speak version="1.0" xml:lang="en-us" xmlns="http://www.w3.org/2001/10/synthesis">
<prosody rate="fast">The boy smiles then backs away from the window. He looks up at a sign above the storefront. It depicts a coiled <phoneme alphabet="ipa" ph="ˈkō-brə">cobra</phoneme> and the words, "strike like a cobra. Cobra Kai Karate."</prosody>
</speak>
As far as I understand, the prosody part can be represented with tta:rate
but the phoneme
part is not currently possible.
As discussed, because other groups are looking into similar topics, we don't want to jump into a conclusion yet. The proposal is to add a note to the DAPT specification saying something like:
Part of the vocabulary of DAPT overlaps with SSML. This version of the specification does not specify how SSML can be either generated from DAPT or embedded into DAPT. Future versions of this specification may do so.
One option is to specify a complex mapping to the SSML <voice>
element from attributes on <ttm:agent>
.
The Timed Text Working Group just discussed SSML
, and agreed to the following:
SUMMARY: Gravitating towards multi-attribute approach maybe in a ssml-specific DAPT namespace
Given the overlap between DAPT and SSML, it would be good to have a clarification on how they relate and can be used together (or not). Section https://w3c.github.io/dapt/#foreign-elements-and-attributes could have an example of "proprietary" metadata mixing SSML and DAPT.