A role for indicating whether a given ruby represents phonetics

murata2makoto commented 3 years ago

On behalf of the Japan DAISY Consortium, I would like to request a role for indicating whether or not a given ruby represents phonetics.

If a ruby represents phonetics, the T2S engine should render either the base or ruby. If not, the T2S engine should render both.

More about this, see 4.4 in Text to Speech of Electronic Documents Containing Ruby: User Requirements.

jnurthen commented 3 years ago

@murata2makoto How does the T2S engine know which one of the base or the ruby it should render if this new role is encountered? Is it expected that the T2S engine will be supplied both and it decides or would the browser or AT be making this decision?

murata2makoto commented 3 years ago

How does the T2S engine know which one of the base or the ruby it should render if this new role is encountered?

This depends on the interface between the T2S engine and screen reader or user agent. Ideally, the T2S engine should receive the ruby base as such and receive the ruby text as such. Based on these two information items, the T2S engine should make a good decision. However, when the T2S engine receives flat text and nothing else or when the accessibility tree does not have these two information items, this idealistic scenario is not possible.

Text to Speech of Electronic Documents Containing Ruby: User Requirements considers this issue thoroughly. But the rule of thumb is to use the ruby base rather than the ruby text.

cookiecrook commented 3 years ago

Thanks for speaking with the ARIA WG today @murata2makoto and helping us understand the examples in your slides. The WG discussion continued after you left, and I spent several hours considering your User Requirements page and discussing with colleagues.

The Working Group consensus is that the distinctions you're trying to make are not limited to the use case of accessibility APIs or assistive technology, so the ARIA spec isn't the best location for them. We suggest pursuing a new content attribute, most likely on the <ruby> or <rt> element that clarifies either:

a semantic distinction between the known types of content, or
a functional distinction between the expected behaviors of user agents.

I'll separate my later individual suggestions into another comment to make it more clear this comment ^ is from the working group and my later comment below is just brainstorming.

cookiecrook commented 3 years ago

This following comment does not represent a consensus view of any organization (WG, employer, etc). It's just a collection of ideas that may be helpful in your progress to find a workable solution.

You mentioned three functional distinctions in your use cases, but I see several more semantic content types in the User Requirements page. I'll attempt to outline them here, including similar examples in English. Note that category 2 (phonetic-optional) incorporates both a speech use case, as well as the mainstream use case you mentioned today. To allow the user to toggle the visible display of these optional groupings.

The bolded names categories are just suggestions to help me keep them straight. I'm not particular to the names.

phonetic-required: Phonetic usage where the rt should always be visibly displayed and pronounced, as it defines an uncommon pronunciation that most would not know.
- your unusual people and place names example
- As a similar place example from the US, there is a road in Austin spelled both "Manchaca" and "Menchaca" but is pronounced by most locals as "Man-Chack."
- possibly your Dewanai example, which as I understood it, is equivalent to the English "My name is Knotnot" which means "My name is Knot" but may be misspoken by TTS (and misunderstood) as "My name is not Knot."
phonetic-optional: Phonetic usage where the rt provides a pronunciation hint to less-experienced readers. Toggling the visible display of these may be be context dependent, as experienced readers would not need them and may find them distracting. Text-to-speech may(?) pronounce the base text as it is sometimes less ambiguous than the rt, but in most cases, the TTS pronunciation of the rt or base will be identical.
- Beginning to intermediate readers, such as your furigana examples
- Difficult-to-pronounce Species or Pharmaceutical names may be appropriate to include here, such as oxoerythromycinoxo-eur-ithro-mycin or oxoerythromycin/ɒk.sə(ʊ).ɪˌɹɪθ.ɹə(ʊ)ˈmʌɪ.sɪn/. Some readers may find the pronunciations helpful, where a domain expert may prefer to hide the pronunciation.
phonetic-complementary Phonetic usage where the both the rt and the base should be spoken.
- Some of your Gikun examples such as 背景バック (HAIKEIback). These are a type of phonetic usage, but in this context, the phonetic rt kana does not define the pronunciation of the base, so the text-to-speech user would miss out on some context if only one were spoken.
notes Interlinear notes, a non-phonetic usage where the both the rt and the base should be spoken. Similar usage to a parenthetical in western languages.
- 徳川家康1543-1616 江戸幕府最後の将軍, effectively "Tokugawa Ieyasu (1543-1616, the last shogun of the Edo shogunate)"
wordplay? another non-phonetic usage where the both the rt and the base should be spoken. (I'm not certain "wordplay" describes all similar uses)
- Your example of enemyfriend meaning frenemy (I could not copy the Japanese text out of your images)
- Another example from a colleague: 丁度良い所に常連カモが来たよ (look, here comes a regular customer now) where 常連 (regular customer)’s ruby says カモ (easy target/victim)

The semantic categories 3, 4, and 5 equate to the same functional category: "always display and speak both" so they might be combined (parenthetical) if there's no other functional need to keep them separate.

What about proposing a new type attribute on the <ruby> element with values [phonetic-required | phonetic-optional | parenthetical]?

Once the Ruby/HTML working group agrees on a specific attribute, we could map it to accessibility and speech APIs to achieve the correct pronunciations, and use it for the other education use case of a visibility toggle on optional Ruby.

cookiecrook commented 3 years ago

The other feedback I'd like to share is related to the understandability of your User Requirements document. A few small changes may help others understand it more easily.

There were a lot of identically named and numbered sections... For example, there are several different sections labeled "1" and "Furigana", each with a slightly different context. It was difficult to parse the sectional context when all the subsections were named the same. For example, consider renaming and renumbering the second "1) Furigana" section to "3.1.1 Furigana when both read aloud" to give more context.
Formatting: Several of the headings use the wrong heading style/size, which added to my confusion. For example Section 3.3 uses and H3 (should be H2) and section 3.2(1) uses an H2 (should be H3).
As these Ruby variants are not common in Western languages, consider adding some English equivalents to the User Requirement page (e.g. Dewanai: My name is Knotnot), to assist in the understanding of any reader (such as myself) who does not read Japanese.

Best of luck to you! Thank you.

aleventhal commented 3 years ago

What about proposing a new type attribute on the element with values [phonetic-required | phonetic-optional | complementary]

+1, nice summary and crisp proposal.

aleventhal commented 3 years ago

Is it potentially confusing that complementary is nothing like role=complementary? Is it more like role=note?

Perhaps not a big deal.

aleventhal commented 3 years ago

Another question, do the semantics only need to support 2 levels of readers (read all annotations or only difficult ones). Or are there plenty of intermediate readers where neither case fits perfectly?

cookiecrook commented 3 years ago

Is it potentially confusing that complementary is nothing like role=complementary?

I'm not particular to the term "complementary" so we should probably not use that. I was thinking, both the base text and the ruby text are complementary of each other, so both should be displayed in all modalities. However, the ruby text is not phonetic, so TTS engines should speak it but not use it to infer pronunciation hints.

Is it more like role=note?

Not all of these are notes. If you want to use the semantic distinction rather than a functional one, there may need to be more than 3 values.

[Update: maybe "parenthetical"?]

cookiecrook commented 3 years ago

Another question, do the semantics only need to support 2 levels of readers (read all annotations or only difficult ones). Or are there plenty of intermediate readers where neither case fits perfectly?

I assume this question is for @murata2makoto.

However, my understanding is that it would be up to the app to display the optional ones or not, based in the in-app user preference, rather than some magic in the user agent.

In either case, the TTS engine would rely on the phonetic-required, and could use or ignore the phonetic-optional.

murata2makoto commented 3 years ago

The WG discussion continued after you left, and I spent several hours considering your User Requirements page and discussing with colleagues.

Thank you for studying this document carefully. Shimono-san of Keio W3C converted to a ReSpec document, available at https://w3c.github.io/ruby-t2s-req/ We plan to create a W3C Note.

The Working Group consensus is that the distinctions you're trying to make are not limited to the use case of accessibility APIs or assistive technology, so the ARIA spec isn't the best location for them. We suggest pursuing a new content attribute, most likely on the <ruby> or <rt> element that clarifies either:

a semantic distinction between the known types of content, or

a functional distinction between the expected behaviors of user agents.

Both the JLreq TF and the Japan DAISY Consortium discussed the idea of creating HTML attributes rather than ARIA roles. Nobody has a strong opinion. We can go either way.

In the I18N session after the AIRA WG meeting, we discussed where in WHATWG or W3C we should discuss this issue. Even within W3C, we might want to extend the charter of the HTML WG or create a community group dedicated to this issue. It is not yet clear.

murata2makoto commented 3 years ago

@cookiecrook and @aleventhal I sincerely welcome your suggestions. Western examples by James are very useful. But before I indulge in interesting discussions about ruby, I would like to summarize the status quo.

In OWP and EPUB, always-double-reading is unfortunately very common. This is very bad for phonetic ruby, which is most common. Always-base-only-reading (hopefully combined with better T2S engines for Japanese) would be a big improvement. It might not be ideal but is OK. Base-only-reading as the default and double-reading for non-phonetic ruby would be better but improving T2S engines is probably more important. I don't believe ruby-only-reading is the right way to go forward.

the English "My name is Knotnot" which means "My name is Knot" but may be misspoken by TTS (and misunderstood) as "My name is not Knot."

I appreciate this very much! I will incorporate it into the upcoming note.

in most cases, the TTS pronunciation of the rt or base will be identical.

Although most Japanese think so, I know that some practitioners strongly disagree. I am not sure if I fully understand their reasons, but morphological analysis of kana-only ruby is likely to fail thus providing an unnatural accent and even incorrect pronunciation of は and へ.

murata2makoto commented 3 years ago

@aleventhal

Another question, do the semantics only need to support 2 levels of readers (read all annotations or only difficult ones). Or are there plenty of intermediate readers where neither case fits perfectly?

We can certainly try to introduce more levels. I know that a DAISY reader in Japan allows users to specify a grade in K12 and expose kanjis beyond that level only. But even the developer of that DAISY reader does not think that this has to be captured by markup. Their implementations examine code points of base characters.

Historically, book catalogs in Japan indicate one of the three levels: ruby-free, para-ruby, and general-ruby. I thus think that we should stick to this tradition in standardization while implementors try interesting experiments.

aleventhal commented 3 years ago

Their implementations examine code points of base characters.

Can we get an understanding of where this heuristic falls down? Or, if it's highly accurate, then do we actually need markup to differentiate between para-ruby and general-ruby?

If the heuristic can be accurate enough, and we only need to know when the ruby is used for a note, then all we really need is to apply role="note" to the <rt> when it is being used a complementary text. The benefit of this is that we could add it to an ARIA draft and easily implement it now. I'm not sure how long it will take to work with the HTML WG to define ruby semantics. I'm not saying we don't want to do that, but that I would seek more information on the real need to differentiate para-ruby and general-ruby in markup, if DAISY readers can already do it without markup.

Finally, I would like to know what the possibilities are for a heuristic that detects the note/complementary situation. Can we get an evaluation on how accurate that could be?

cookiecrook commented 3 years ago

in most cases [of the "phonetic-optional" examples above], the TTS pronunciation of the rt or base will be identical.

Although most Japanese think so, I know that some practitioners strongly disagree. I am not sure if I fully understand their reasons, but morphological analysis of kana-only ruby is likely to fail thus providing an unnatural accent and even incorrect pronunciation of は and へ.

I agree there are edge cases, and that the は example is likely to be pronounced better if the base text is sent to the text-to-speech engine. However, once the text-to-speech engines understand ruby context, I think exposing both (to be pronounced as a single instance) is likely to result in better results, not worse. Ruby-unaware speech engines should just attempt to pronounce the base text in those instances of "phonetic-optional."

cookiecrook commented 3 years ago

@murata2makoto Is it okay to close this issue and #1619, or is there more you'd like to clarify before closing? Please do link any relevant issues in other repositories.

aleventhal commented 3 years ago

@murata2makoto , you mentioned that assuming we can announce the <rt> text instead of the base text would be catastrophic, because it could change the meaning. (I admittedly had a hard time understanding why, since the rt is supposed to be an announcement, and my understanding of Japanese is zero). However, are there times that it would be better to read the rt instead of the base text, and is that something we might want to put in author control via a semantic? Right now, I think our semantics would suggest times to read both, but not just the rt text.

aleventhal commented 3 years ago

Should I be asking my questions about semantics in https://github.com/w3c/ruby-t2s-req/issues/7 ?

murata2makoto commented 2 years ago

@cookiecrook

I am very sorry for this belated reply. I deeply appreciate all your suggestions.

Re: "phonetic-required" and "phonetic-optional"

Thank you very much for this suggestion. But I am not sure if they should be separated.

First, ruby causes serious problems to some of the Japanese dyslexic people. They mistakenly think that ruby is a strange radical and fails to recognize the base character. Hiding every ruby is a sensible option. (But using a different color for ruby is sensible. Widening the gap between the base and ruby is also sensible.)

Second, when the same word is repeated, it is quite common to make ruby visible only for the first occurrence. Thus, I do not think that some ruby should always be visible.

w3c / aria

A role for indicating whether a given ruby represents phonetics #1620