w3c / pronunciation

Spoken Presentation Task Force deliverables
https://www.w3.org/WAI/APA/task-forces/pronunciation/
Other
20 stars 12 forks source link

Inching towards cases where the same symbol/character/sequence is used for different unit types #109

Open brennanyoung opened 2 years ago

brennanyoung commented 2 years ago

A question came up on stack overflow about steering the pronunciation of the "inch" symbol.

This is typically a straight double quote character, although "smart quotes" are not uncommon (as an artefact of 'clever' word processor features) and unicode offers the "double prime" character which is (I suppose) the most correct choice.

This symbol is also used for "minutes" (a unit of both time and geometry).

Imagine a document where the double quote symbol is used both for inches and seconds, in different contexts.

How might we distinguish "minutes" from "inches", and both of these from a standard "double quote" used to denote speech or citation?

(Similarly with feet, seconds and single prime, which are all expressed by single quote characters. There are certainly other examples).

Does the current SSML spec support this? I can't really find a viable value for say-as or interpret-as which would make this distinction unambiguous.

AutoSponge commented 2 years ago

@brennanyoung thanks for this use case!

I think the universally acceptable choice here in SSML would be <sub alias="minutes">&quot;</sub> except when used for length (inch), there's no agreement with plurality. The parsing of content and automatic handling plurals is more like what say-as does. This leaves us with authors using sub and correctly adding the translation and quantity agreement for each instance. That's less than ideal. The detail attribute might be able to help except that it's implementation specific and does not convey author intent.

The say-as element, in some implementations (see AMZ), will correctly pronounce <say-as interpret-as="time">1 &quot;</say-as> as "one second" because it does not expect to parse the contents given a (required) format attribute. There's also no length value for interpret-as in any implementation I've seen. So, while this works in specific contexts and specific implementations, it's not by any means universal.

Therefore, this may be another use case for our TF to look beyond SSML and ensure author intent is preserved while taking advantage of basic heuristics in TTS. I look forward to SSML, MathML, and TTS engine experts weighing in on this issue.