w3c / a11y-discov-vocab

Repository for the maintenance of the schema.org accessibility property values for discoverability.
https://www.w3.org/community/a11y-discov-vocab/
Other
15 stars 8 forks source link

Distinguishing fully-annotated ruby and partially-annotated ruby #68

Closed murata2makoto closed 1 year ago

murata2makoto commented 1 year ago

I raise this issue on behalf of the technical committee of the Japan DAISY consortium. (I am ccing @xfq, @kidayasuo, @r12a, and @himorin).

The current definition of rubyAnnotations in Schema.org Accessibility Properties for Discoverability Vocabulary indicates whether or not ruby annotations are present. But this is not good enough.

Attaching ruby annotations to all CJK ideographic characters in a publication is called "fully-annotated ruby". If you have really serious problems with CJK ideographic characters, you prefer fully-annotated ruby. The most common request from users of Japanese DAISY textbooks is fully-annotated ruby. Meanwhile, attaching ruby annotations to some (typically, difficult ones) is called "partially-annotated ruby". Many Japanese publications provide "partially-annotated ruby".

Accessibility metadata should indicate whether a given publication allows "fully-annotated ruby" or "partially-annotated ruby". Note that an electronic publication may allow both "fully-annotated ruby" and "partially-annotated ruby" by the use of different styles. Thus, accessibility metadata should be able to represent four possibilities: (1) "fully-annotated ruby" but not "partially-annotated ruby", (2) "partially-annotated ruby" but not "fully-annotated ruby", (3) both "fully-annotated ruby" and "partially-annotated ruby", and (4) neither "partially-annotated ruby" nor "fully-annotated ruby".

GeorgeKerscher commented 1 year ago

This seems to be the same issue as with MathML. People want to know if there is some MathML or if all the math is marked up with MathML. Looks like the AccessibilitySummary is the right place for this information.

murata2makoto commented 1 year ago

The JDC TC will discuss the use of the accessibility summary about this in November and then respond. Personally, I feel that machine-readable metadata is needed here very strongly.

mattgarrish commented 1 year ago

We've bumped into this issue of trying to have metadata explain amounts before, as @GeorgeKerscher mentions, and it invariably leads to needing to use a summary. The problem with "partial" is that it really doesn't tell you anything. Does partial mean 1%, in which case whatever feature being described is probably useless to users, or does partial mean 99%, which is nearly perfect. Does partial cover critical content but not non-critical, or the other way around. And then there's everything in between those extremes.

Users can't form an idea of how useful the content will be without a description in these cases, so multiplying the features they have to search for by adding quantitative measures tends to add more complexity without enhancing the value.

murata2makoto commented 1 year ago

@mattgarrish

The problem with "partial" is that it really doesn't tell you anything.

In theory, yes. But in reality, no. One commonly-used criterion is Jōyō kanji. I will provide more details in November.

murata2makoto commented 1 year ago

Partial ruby annotation is very different from what Matt (@mattgarrish) and George (@GeorgeKerscher) think.

If a publication has partial ruby annotation, the target audience is expected to have no problems with the CJK ideographic characters in that publication. In other words, ideographic characters too difficult for the target audience should have ruby.

When the target audience is the general public, Joyo Kanji is used as a guideline for determining whether or not ruby is needed. The rule of thumb is simple. Ideographic characters in Joyo Kanji do not need ruby annotations, while those beyond it need ruby.

What is Joyo Kanji? It is a set of CJK ideographic characters commonly used in Japan. The latest version of Joyo Kanji consists of 2,136 ideographic characters. The Japanese government created Joyo Kanji as a guideline rather than a compulsory rule. But in Japanese compulsory education, students learn all ideographic characters in Joyo Kanji and no other ideographic characters. Official documents from the Japanese government should not use ideographic characters beyond Joyo Kanji. For more about Joyo Kanji, see a Wikipedia article.

Are there many ideographic characters beyond Joyo Kanji? Japanese Industrial Standard X 0208 (7-bit and 8-bit double byte coded KANJI sets for information interchange) contains 6,355 ideographic characters. But it is generally agreed that JIS X 0208 does not have enough ideographic characters. A newly-created standard (JIS X 0213) added more than 3,500 ideographic characters. The ideographic characters in JIS X 0208 and JIS X 0213 (at least those in the BMP) are assumed to be usable in EPUB publications. If we use Joyo Kanji as a guideline for partial ruby annotation, almost 8,000 ideographic characters usable in EPUB publications need ruby.

Although Joyo Kanji is used as a guideline, there are lots of exceptions.

First, publications for kids are different. There are subsets of Joyo Kanji for each grade in elementary schools. These subsets are used as guidelines for determining whether or not ruby is needed for kids. (For more about this, see Kyōiku kanji ).

Second, easy CJK ideographic characters sometimes have very special readings. For example, 竹生島 (chiku-bu-shima) consists of characters in Joyo Kanji but most Japanese cannot read it correctly. Thus, ruby is useful for these simple ideographic characters.

Third, when the same ideographic character occurs repeatedly, we often drop ruby annotations for non-first occurrences. (But "first" may mean "first in a publication", "first in a section", and "first in a spread" among others).

Fourth, when the target audience of a publication is different from the general public, Joyo Kanji might be too restrictive. Graduates of universities probably read more than 1,000 ideographic characters beyond Joyo Kanji. Thus, some authors, publishers, and newspaper companies think that some ideographic characters beyond Joyo Kanji do not need ruby.

To conclude, if an EPUB publication has partial ruby-annotation, those who do not have special problems with ideographic characters can safely assume that none of the ideographic characters in this publication cause trouble to them. But those who do have special problems with ideographic characters are likely to encounter difficult ideographic characters without ruby. Therefore, EPUB metadata should be able to distinguish full ruby-annotation and partial ruby-annotation.

mattgarrish commented 1 year ago

Your choice of naming is what I find problematic. Partial can suggest that only some documents include ruby, for example. It's a measure of completeness, and from experience people don't read the descriptions but use metadata how they think they interpret the names.

In this case, as I understand it, partial doesn't mean incomplete, but that the ruby fully covers a subset of characters.

Is there perhaps a more precise name than "partial" you could use? I don't know if it makes sense, but something like "joyoRuby" wouldn't expose us to seemingly endorsing "partial" features.

If a feature is truly only partially provided, that's where the summary is meant to come in to explain why.

murata2makoto commented 1 year ago

@mattgarrish @himorin @fantasai @frivoal @kidayasuo @xfq

Indeed, the ruby terminology is not at all mature. W3C JLreq uses "general ruby" and "para ruby", which are direct translations of Japanese terms (総ルビ and パラルビ). But non-Japanese experts find "general ruby" and "para ruby" very confusing.

Recently, "partially-annotated ruby" was proposed. I changed my mind today and used "partial ruby annotation" in the comment shown above. But Matt correctly pointed out that "partial" is vague. Indeed, "para ruby" does not imply that authors can freely choose any ideographic characters as the target of ruby annotations. There are guidelines for the choice of ruby targets. How do others feel?

himorin commented 1 year ago

note, related discussion in JLreq: https://github.com/w3c/jlreq/wiki/English-ruby-terminology

r12a commented 1 year ago

Could someone explain to me (or point to) how AccessibilitySummary is used?

I have a couple of clarification questions, if i may:

It seems @murata2makoto that you are essentially asking for a flag to indicate whether a text has full ruby or not full ruby. How would such a flag be used and by whom? And are we talking about something that may appear in markup, or in some accompanying metadata, such as a manifest?

Would there be a need to specify the items annotated, rather than just whether it's full/not full? For example, to say the annotations cover Joyo Kanji, or Joyo+, etc. I'm guessing that would be difficult, but i just wanted to check.

Also, what's the situation for Chinese (cc @xfq) and Mongolian (which are the other main candidates for use of ruby these days)?

mattgarrish commented 1 year ago

The accessibilitySummary property is a free-form text field where you can express information that isn't conveyed by the machine-processable fields like accessibilityFeature.

As @GeorgeKerscher mentioned earlier in this thread, if you were to put accessibilityFeature=rubyAnnotation then in the summary you could be more explicit about whether that means all characters are covered or only a subset.

The difficulty, of course, would come in locating content if it's in the summary. I'm assuming in this case the metadata is being designed for EPUBs, so it would go into a bookstore catalogue to allow readers to filter books (but we try not to write the properties or definitions to be specific to a format). If it's in the summary, filtering becomes unlikely, as bookstores will show the summary but don't index it for searching.

murata2makoto commented 1 year ago

It seems @murata2makoto that you are essentially asking for a flag to indicate whether a text has full ruby or not full ruby. How would such a flag be used and by whom? And are we talking about something that may appear in markup, or in some accompanying metadata, such as a manifest?

Such a flag is useful for those Japanese who have particular problems with CJK ideographic characters.

Would there be a need to specify the items annotated, rather than just whether it's full/not full? For example, to say the annotations cover Joyo Kanji, or Joyo+, etc. I'm guessing that would be difficult, but i just wanted to check.

I do not see book catalogs these days. But when I did, book catalogs indicate whether 総ルビ or パラルビ . I have never seen book catalogs indicate that ideographic characters beyond elementary schools have ruby annotations.

Also, what's the situation for Chinese (cc @xfq) and Mongolian (which are the other main candidates for use of ruby these days)?

In my understanding, the automatic generation of ruby annotations for Chinese works much nicer than in Japan. Even if authors or publishers do not embed ruby annotations, I guess that future Chinese users will be able to completely rely on the automatic generation of ruby annotations by the EPUB reading system.

r12a commented 1 year ago
It seems @murata2makoto that you are essentially asking for a flag to indicate whether a text has full ruby or not full ruby. How would such a flag be used and by whom? And are we talking about something that may appear in markup, or in some accompanying metadata, such as a manifest?

Such a flag is useful for those Japanese who have particular problems with CJK ideographic characters.

@murata2makoto I assume that already, because you are recommending it, but i'm trying to understand better in what way and what circumstances it is useful. And how you envisage the information could be stored.

murata2makoto commented 1 year ago

Such a flag is useful for those Japanese who have particular problems with CJK ideographic characters.

@murata2makoto I assume that already, because you are recommending it, but i'm trying to understand better in what way and what circumstances it is useful. And how you envisage the information could be stored.

The most common request by users of DAISY textbooks in Japan is 総ルビ ("general-ruby" in JLreq, and "fully-annotated ruby"). They have particular problems with CJK ideographic characters. パラルビ ("para-ruby" in JLreq and "partially-annotated ruby") is not good enough for them.

When such users would like to buy an EPUB publication, they would like to know if it has 総ルビ. If its meta data announces パラルビ rather than 総ルビ, they will not buy the book. Accessibility metadata for EPUB publications are embedded within EPUB publications but may well be used by e-book stores.

Is this good enough?

r12a commented 1 year ago

@murata2makoto that helps me understand much better, thanks. So it seems to me that we don't really need to worry about precise terminology for what some call 'para-ruby', because it's basically equivalent to fully-ruby-anotated = off.

murata2makoto commented 1 year ago

@r12a

fully-ruby-anotated = off.

This would also allow publications to have no ruby annotations. Not distinguishing ruby-free publications and "para-ruby" publications sounds strange to me.

BTW, ruby is suboptimal. Some dyslexic users are confused by ruby since they cannot separate base text and ruby annotations. Such users might want to search for ruby-free publications. (But there are other remedies such as changing the color of ruby annotations and widening the gap between base text and ruby annotations.)

r12a commented 1 year ago

Ok. So something like:

ruby-annotation-coverage: full | none

Another value could be partial, if there's an identified need for that. If there is, that avoids the para terminology issue.

murata2makoto commented 1 year ago

Ok. So something like:

ruby-annotation-coverage: full | none

This works for publications without ruby or publications such that every CJK character has ruby. Are you saying that the absence of your ruby-annotation-coverage implies "para-ruby"?

Another value could be partial, if there's an identified need for that. If there is, that avoids the para terminology issue.

I am now wondering if we should stick to "para-ruby". Its meaning is not clear but "partial" appears to invite misunderstanding by non-Japanese people. I strongly want to avoid coining new terms since everybody in the Japanese publishing industry knows "para-ruby" already.

murata2makoto commented 1 year ago

Another value could be partial, if there's an identified need for that.

Those who have particular problems with CJK ideographic characters would like to know which publication has "general ruby" and which does not. They would like to know even before buying EPUB publications. Those who did not go to universities might want to buy books only when they have ruby ("general ruby" or "para ruby"). Some dyslexic users love ruby-free publications.

xfq commented 1 year ago

Also, what's the situation for Chinese (cc @xfq) and Mongolian (which are the other main candidates for use of ruby these days)?

This situation exists, but I'm not aware of any special term. I will ask the layout experts.

In my understanding, the automatic generation of ruby annotations for Chinese works much nicer than in Japan. Even if authors or publishers do not embed ruby annotations, I guess that future Chinese users will be able to completely rely on the automatic generation of ruby annotations by the EPUB reading system.

Indeed, the automatic generation of ruby annotations for Chinese works better than Japanese, but there may still be errors for rare ideographic characters and heteronyms. In general, I think the demand for ruby in Chinese is less than that in Japanese.

murata2makoto commented 1 year ago

We already have "rubyAnnotations" as part of the accessibilityFeatureProperty. If we add "fullRubyAnnotations", we can avoid choosing a term for para-ruby. "fullRubyAnnotations" is used only when every CJK ideographic character is associated by ruby annotations. "rubyAnnotations" is used for para-ruby.

I will create a pull request based on this idea.

mattgarrish commented 1 year ago

Closing with #79