Digital native version of JLReq, discuss goals and changes from the current version

kidayasuo commented 3 years ago

(This part is an evolving document describing issues and proposed changes)

Issues with the current version of JLReq, especially when we want to apply it to digital text:

The rules assumes printing workflow which assumes manual proofreading to handle some exceptional cases. As the result there are cases that are hard to automate.
Some of the rules do not make sense on digital any more because they reflect limitations with non-digital production methods, including metal type.
No prioritization of the various features - the whole is a high bar; which parts should be considered table-stakes for all engines? Which are inseparable? The basic rules can be simplified to encourage implementations by wold-wide developers.
There are concepts that are obvious to the area professionals who are native to the language, but not be so to the implementors of layout software in the world.
It requires non-obvious exercises to apply it to internationalized software environment, e.g. unicode.
Whenever there was not a single clear answer for a given practice, the topic was left out altogether, even though some discussion would be very useful to implementors (as they often have to implement to edge cases anyway)

Reflecting these issues and new opportunities possible changes could be:

Describe a set of simplified basic rules that are absolute minimum. Describe advanced features as options.
Improve affinity to internationalized software environment, for example by basing it on Unicode.
Add descriptions for concepts that are not necessarily obvious to software developers with knowledge on English typography. Also, it would help providing information on each rule why it is relevant.
When there are multiple possible layout options, describe the default, or how one can choose one method as the default.
Describe rules for edge cases where possible to make it rubout against exceptional cases. i.e. help automating the layout.
Possible new contents for digital text especially for the reflowable architecture? e.g. optimum combinations of line length and line height considering the purpose of the text, improving layout for dyslexia, consideration for screen size? etc.
Consideration for other types of text than regular books? e.g. magazines, everyday digital text such as email and presentations?
Add a section on combining Japanese font with English and other fonts. issues such as baseline positioning, relative size. matching typeface design, etc.

Who are the target audience (and their priority) by the way? is the following reasonable?

developers
people who make ebooks / web pages, UI designers, etc. anyone who makes contents

himorin commented 3 years ago

リストにあるようにこれまで更新について文字クラスとルビを含む文章中の文字の扱いを中心に議論してきましたが、 https://github.com/abenori/jlreq/issues/85#issuecomment-869298606 のように、ページ割りに関しても電子版で見開きで読めるソフトやEPUB-RSがどれくらいあるのかなど、ページの部分、柱などの電子版での扱われ方、など、ウェブページ以外のHTML/CSSの場面での指針的なものも何か更新があるかな、とも思いました。

As listed, we mainly discussed items focusing on character classes and inline display including Ruby, but as discussed in LaTeX jlreq.cls (link above), there could be additional room of discussion on such as page numbering and page bounding, which does not have much meaning on electric books in PDF or EPUB-RS without supporting two pages spread display. So, we might be better to add guides on non-web page based materials, like on page formats, heads, etc.

macnmm commented 3 years ago

my presentation to TPAC 2019 on my thoughts for the future of JLReq: https://lists.w3.org/Archives/Public/www-archive/2019Sep/att-0003/TPAC_JLREQ_2019.pdf

and notes

macnmm commented 3 years ago

...and the original F2F discussion that became the above: https://www.dropbox.com/s/tuyelwb0pb4fz6f/2019.05.20%20JLReqv2%20F2F.pdf?dl=0

kidayasuo commented 3 years ago

@macnmm could you extend a part of your preso inline, i.e. what need to change or added? thanks!

macnmm commented 2 years ago

I see several areas that could be expanded or added:

The importance of type placement relative to the CJK embox -- how text object boundaries, line boxes, text run alignment, leading, white space around the text, etc all are conceived to be relative to the embox (type virtual body) rather than to the Latin baseline or other Latin metric (even though fonts are built using Latin metric-based tools, and digital rendering pipelines use the Latin baseline as the origin point). This means engines must introduce the embox concept throughout their implementation and switch between it and the Latin-based conventions when composing text using Japanese rules.
The issue of how a given character's attributes or composition rules are governed not only by the Unicode code point, but also the font's implementation of features and glyph design to result in a final glyph. Perhaps a diagram would be helpful to show this. This issue touches mojikumi class determination, vertical variant and vertical orientation, kinsoku line break rules, etc.
The basic mojikumi aki rules and classes were conceived in an age when production tooling used fixed widths for the characters and made adjustments to the spacing in fractional units of 1 em (e.g. 1/2 em aki). The normal width of the line would be in whole em units, and thus monospaced Japanese text would often fit perfectly, or be easily adjusted to fit perfectly. Modern-day fonts and typographic practice, however, has evolved to include proportional typography to be interspersed with monospaced Japanese text, and the old rules and conventions for adjusting spacing have remained unchanged, or have strayed into undefined behaviors. In text with lots of Latin script words, or even using proportional kana and punctuation glyphs, has introduced a situation where the old adjustment rules are overly strained to create the same neatly aligned text as before. In fact, such neat alignment may not even be desirable, yet we have no replacement for the basic mojikumi aki adjustment rules to serve these new use cases.
Reflowable document creation is even more complex in that all the mojikumi aki spacing has to be expressed in ranges, the text object sizing may not be in even em units, the line breaking will be unpredictable, etc. We should state clearly which conventions from static document creation should be preserved (e.g. the embox-based measurements and dimensions) and which can be relaxed, so a natural hierarchy of rules and conventions can be understood.

kidayasuo commented 2 years ago

Thank you @macnmm for your insights.

As for the point 1, can we expect most web browsers, email, memo applications, etc. to switch to the embox model? It would depend on the amount of work required, and the extent of the issue when it was not implemented. In JLreq I believe we should explain best practices with Latin baseline model, and explain what drawbacks it has. It would explain why one might want to implement the embox model. It would be super if you could write it up. I am very looking forward to learning it as I myself do not have enough understanding of the issue.

Could you elaborate on the point 2? As you mentioned a diagram would greatly help.

I completely agree with your point 3 & 4. Could you make a separate GH issue to start developing new rules? what are points to be discussed?

macnmm commented 2 years ago

I completely agree with your point 3 & 4. Could you make a separate GH issue to start developing new rules? what are points to be discussed?

Added #296

macnmm commented 2 years ago

diagram describing engine issue with ambiguous Unicode that is not solved with UAX50 but could be solved with CDEF?

kidayasuo commented 2 years ago

diagram describing engine issue with ambiguous Unicode that is not solved with UAX50 but could be solved with CDEF?

On this point is there anything that JLReq TF can do to solve the issue? These are a part of 9 code points where many text editing applications default to proportional while they are also used in Japanese layout.

Unicode	Character name
U+2018	LEFT SINGLE QUOTATION MARK
U+201C	LEFT DOUBLE QUOTATION MARK
U+2019	RIGHT SINGLE QUOTATION MARK
U+201D	RIGHT DOUBLE QUOTATION MARK
U+2010	HYPHEN
U+2013	EN DASH
U+2014	EM DASH
U+2025	TWO DOT LEADER
U+2026	HORIZONTAL ELLIPSIS

acli commented 2 years ago

I think the situation with U+2026 is hopeless; it’s really a bug in Unicode, not anything the W3C can do. They should never have merged the European ellipsis with the CJK half-ellipsis; they are completely different glyphs that only happen to sometimes look the same. A minimal pair, in linguistics terms. They never did their lingusitic analysis right.

The same can probably can said of at least some of the others but it’s harder to make the case. But of course in CJK typography we don’t really work in ems (which really is the root cause of some of these problems – and this myth that the em is a valid unit in CJK is being perpetuated at least in CLreq); they got their basic unit wrong and I’m not sure if there’s anything that can be done to fix the whole mess.

macnmm commented 2 years ago

There are any number of examples of needless duplication of the alphabet in Unicode if you base your unification decision on appearance/semantics alone. In the case of the above list, the primary reason they should not have been unified is their treatment when designed as full width glyphs makes them essentially different characters than their Latin counterparts. Especially in vertical text, but also when composing horizontally. I wish this could be fixed but we seem to be stuck with awkward work-arounds and lots of necessary user education.

murata2makoto commented 2 years ago

Proposed addition to Kida-san's list:

Japanese typography on the Web is simply a scaled-down reproduction of the printing tradition from the Meiji era. Since printed materials are unreadable for those who have print disabilities, Japanese text on the Web is not very accessible. One of the goals of the new JLreq is to revisit traditional typographical features for better accessibility on digital devices.

acli commented 2 years ago

Do you have examples of specific features in traditional Japanese typography that are inaccessible on the web? I might be ignorant since I don’t speak Japanese, but I can’t think of anything obvious (other than multiple pronunciations) that would make Japanese especially inaccessible.

kidayasuo commented 2 years ago

I believe what he meant is about dyslexia. People who have difficulties in reading while their eyes function normally. It is a collective term for many different symptoms but some have difficulties in tracking lines especially when it is in vertical orientation, and some have difficulties in separating ruby from the base text, etc. I believe (but not for sure) there are small changes / considerations we can make to make reading easier for such people especially when their symptoms are relatively light, and sometimes such changes make reading easier for all people.

@murata2makoto san, correct me or supplement if necessary.

murata2makoto commented 2 years ago

@acli

The character size of ruby characters is, in principle, the half size of the base characters (see Figure 114).

This is quoted from JLreq 3.3.3.

IMHO, this convention was introduced just because it is convenient for letterpress printing. It has nothing to do with readability. In particular, it is hard for low-vision people. It has been reported that ruby characters having longer height are more readable.

acli commented 2 years ago

The character size of ruby characters is, in principle, the half size of the base characters (see Figure 114).

Ok, this is fair. Thanks very much for the insight. This is actually an aspect of accessibility that’s often not talked about I’d say, but won’t you agree that if we’re talking about the web, this is actually a case of “reverse discrimination”? Blind people would in theory be able to read the ruby; low-vision people would be able to use zoom; it’s people with “normal” vision (and won’t/can’t use zoom) that’s impacted.

Half of 12pt (not really the size of normal “print” on the web) is 6pt. It’s quite well below the threshold of legibility. I’d totally agree that taller ruby would be more legible.

ETA: I’m not against taller ruby, but if we want to maintain tradition, wouldn’t it make more sense to work backwards, saying because ruby is 50% of normal characters, Web pages should be displayed at at least 2rem (okay, this is ridiculously large, maybe we do need taller ruby as a compromise) so that ruby is at least 1rem? This is how we handle it in English (for superscripts etc. that are pretty much ruby-sized), at least in theory.

w3c / jlreq

Digital native version of JLReq, discuss goals and changes from the current version #281