w3c / jlreq

Text Layout Requirements for Japanese
https://w3c.github.io/jlreq/
Other
100 stars 16 forks source link

Digital native version of JLReq, discuss goals and changes from the current version #281

Open kidayasuo opened 3 years ago

kidayasuo commented 3 years ago

(This part is an evolving document describing issues and proposed changes)

Issues with the current version of JLReq, especially when we want to apply it to digital text:

Reflecting these issues and new opportunities possible changes could be:

Who are the target audience (and their priority) by the way? is the following reasonable?

  1. developers
  2. people who make ebooks / web pages, UI designers, etc. anyone who makes contents
himorin commented 3 years ago

リストにあるようにこれまで更新について文字クラスとルビを含む文章中の文字の扱いを中心に議論してきましたが、 https://github.com/abenori/jlreq/issues/85#issuecomment-869298606 のように、ページ割りに関しても電子版で見開きで読めるソフトやEPUB-RSがどれくらいあるのかなど、ページの部分、柱などの電子版での扱われ方、など、ウェブページ以外のHTML/CSSの場面での指針的なものも何か更新があるかな、とも思いました。

As listed, we mainly discussed items focusing on character classes and inline display including Ruby, but as discussed in LaTeX jlreq.cls (link above), there could be additional room of discussion on such as page numbering and page bounding, which does not have much meaning on electric books in PDF or EPUB-RS without supporting two pages spread display. So, we might be better to add guides on non-web page based materials, like on page formats, heads, etc.

macnmm commented 3 years ago

my presentation to TPAC 2019 on my thoughts for the future of JLReq: https://lists.w3.org/Archives/Public/www-archive/2019Sep/att-0003/TPAC_JLREQ_2019.pdf

and notes

macnmm commented 3 years ago

...and the original F2F discussion that became the above: https://www.dropbox.com/s/tuyelwb0pb4fz6f/2019.05.20%20JLReqv2%20F2F.pdf?dl=0

kidayasuo commented 3 years ago

@macnmm could you extend a part of your preso inline, i.e. what need to change or added? thanks!

macnmm commented 2 years ago

I see several areas that could be expanded or added:

kidayasuo commented 2 years ago

Thank you @macnmm for your insights.

As for the point 1, can we expect most web browsers, email, memo applications, etc. to switch to the embox model? It would depend on the amount of work required, and the extent of the issue when it was not implemented. In JLreq I believe we should explain best practices with Latin baseline model, and explain what drawbacks it has. It would explain why one might want to implement the embox model. It would be super if you could write it up. I am very looking forward to learning it as I myself do not have enough understanding of the issue.

Could you elaborate on the point 2? As you mentioned a diagram would greatly help.

I completely agree with your point 3 & 4. Could you make a separate GH issue to start developing new rules? what are points to be discussed?

macnmm commented 2 years ago

I completely agree with your point 3 & 4. Could you make a separate GH issue to start developing new rules? what are points to be discussed?

Added #296

macnmm commented 2 years ago

diagram describing engine issue with ambiguous Unicode that is not solved with UAX50 but could be solved with CDEF?

kidayasuo commented 2 years ago

diagram describing engine issue with ambiguous Unicode that is not solved with UAX50 but could be solved with CDEF?

On this point is there anything that JLReq TF can do to solve the issue? These are a part of 9 code points where many text editing applications default to proportional while they are also used in Japanese layout.

Unicode Character name
U+2018 LEFT SINGLE QUOTATION MARK
U+201C LEFT DOUBLE QUOTATION MARK
U+2019 RIGHT SINGLE QUOTATION MARK
U+201D RIGHT DOUBLE QUOTATION MARK
U+2010 HYPHEN
U+2013 EN DASH
U+2014 EM DASH
U+2025 TWO DOT LEADER
U+2026 HORIZONTAL ELLIPSIS
acli commented 2 years ago

I think the situation with U+2026 is hopeless; it’s really a bug in Unicode, not anything the W3C can do. They should never have merged the European ellipsis with the CJK half-ellipsis; they are completely different glyphs that only happen to sometimes look the same. A minimal pair, in linguistics terms. They never did their lingusitic analysis right.

The same can probably can said of at least some of the others but it’s harder to make the case. But of course in CJK typography we don’t really work in ems (which really is the root cause of some of these problems – and this myth that the em is a valid unit in CJK is being perpetuated at least in CLreq); they got their basic unit wrong and I’m not sure if there’s anything that can be done to fix the whole mess.

macnmm commented 2 years ago

There are any number of examples of needless duplication of the alphabet in Unicode if you base your unification decision on appearance/semantics alone. In the case of the above list, the primary reason they should not have been unified is their treatment when designed as full width glyphs makes them essentially different characters than their Latin counterparts. Especially in vertical text, but also when composing horizontally. I wish this could be fixed but we seem to be stuck with awkward work-arounds and lots of necessary user education.

murata2makoto commented 2 years ago

Proposed addition to Kida-san's list:

Japanese typography on the Web is simply a scaled-down reproduction of the printing tradition from the Meiji era. Since printed materials are unreadable for those who have print disabilities, Japanese text on the Web is not very accessible. One of the goals of the new JLreq is to revisit traditional typographical features for better accessibility on digital devices.

acli commented 2 years ago

Do you have examples of specific features in traditional Japanese typography that are inaccessible on the web? I might be ignorant since I don’t speak Japanese, but I can’t think of anything obvious (other than multiple pronunciations) that would make Japanese especially inaccessible.

kidayasuo commented 2 years ago

I believe what he meant is about dyslexia. People who have difficulties in reading while their eyes function normally. It is a collective term for many different symptoms but some have difficulties in tracking lines especially when it is in vertical orientation, and some have difficulties in separating ruby from the base text, etc. I believe (but not for sure) there are small changes / considerations we can make to make reading easier for such people especially when their symptoms are relatively light, and sometimes such changes make reading easier for all people.

@murata2makoto san, correct me or supplement if necessary.

murata2makoto commented 2 years ago

@acli

The character size of ruby characters is, in principle, the half size of the base characters (see Figure 114).

This is quoted from JLreq 3.3.3.

IMHO, this convention was introduced just because it is convenient for letterpress printing. It has nothing to do with readability. In particular, it is hard for low-vision people. It has been reported that ruby characters having longer height are more readable.

acli commented 2 years ago

The character size of ruby characters is, in principle, the half size of the base characters (see Figure 114).

Ok, this is fair. Thanks very much for the insight. This is actually an aspect of accessibility that’s often not talked about I’d say, but won’t you agree that if we’re talking about the web, this is actually a case of “reverse discrimination”? Blind people would in theory be able to read the ruby; low-vision people would be able to use zoom; it’s people with “normal” vision (and won’t/can’t use zoom) that’s impacted.

Half of 12pt (not really the size of normal “print” on the web) is 6pt. It’s quite well below the threshold of legibility. I’d totally agree that taller ruby would be more legible.

ETA: I’m not against taller ruby, but if we want to maintain tradition, wouldn’t it make more sense to work backwards, saying because ruby is 50% of normal characters, Web pages should be displayed at at least 2rem (okay, this is ridiculously large, maybe we do need taller ruby as a compromise) so that ruby is at least 1rem? This is how we handle it in English (for superscripts etc. that are pretty much ruby-sized), at least in theory.