w3c / pronunciation

Spoken Presentation Task Force deliverables
https://www.w3.org/WAI/APA/task-forces/pronunciation/
Other
20 stars 12 forks source link

Need for Authors to be able to set Social and Emotional Characteristics of TTS (text to speech) #114

Open SuzanneTaylor opened 2 years ago

SuzanneTaylor commented 2 years ago

[This github entry is from the Accessibility for Children Community Group]

Although more research is needed to specify which types of voices would be best for which applications at the content-level, it is important at the technology-ecosystem-level to introduce the ability to set social and emotional speech characteristics.

Situations in which setting these characteristics can be important with rough examples of markup solutions

Markup suggestions are only to give an idea of the type of control that is needed and have not been carefully crafted and edited so far. Affect-bias defines core categories such as Joy, Shame, Anger, Interest, Excitement, Startle, etc. These attributes may help us design markup that will allow authors specify appropriate voice tones.

"Friendliness"

Neutral

Additional Situations to be Addressed

Education

AI

AutoSponge commented 2 years ago

This reminds me of https://www.w3.org/TR/emotionml/. We may need to review it for hints of how to incorporate emotion into this spec.

brennanyoung commented 2 years ago

I strongly approve of anticipating the need for 'affective' characteristics for synthetic voices. In our use case (medical simulation), we use voices which can be in pain, out of breath, anxious and relaxed. I agree that EML is a promising place to start. Some great work in there.