Open himorin opened 4 years ago
For point 2.a, Binn-sensei pointed two possible ways of baselines as:
a. Define 'Character class' to describe general layout method of individual character, and not to include ones for specific formatting (like ruby) which should be described in each formatting method b. Layout between text block with specific layout method and ones before/after, could differ from layout of their original character class. Keep having additional 'Character class' to be used for definition of such points.
Shimono-san, thank you very much for bringing this up and making a summary.
Expanding JLReq to Unicode, or in more generic sense making JLReq interoperable with Unicode, I think is the biggest challenge in bringing JLReq to the next level (or the next major version). It is about making it future compatible.
It is rather a complex task as Shimono-san outlined. JLReq's character class is a combination of static property of characters and the context, where the character is used. We need to separate the context from the static property. It is a major architectural conversion which requires many rewrites.
and then we would re-define JLReq character classes using Unicode character properties. There might be cases where the current Unicode property is not sufficient to differentiate necessary behaviours.
In the process we might find cases where JLReq can be simplified (especially because the next major version will be devoted to digital text). Also in the process I believe there will be cases where we need clearer ideas on how each character, especially symbols, are to be used. It will lead to some guideline-ish description in the document (for this we need to be careful because we are not in the position of defining orthography of the language)
I proposed an online meeting to discuss over Bin-sensei's proposal for separating the context classes.
One issue I see with the idea of adopting Unicode Character Property was a descriptor for use in JLReq:
JLReq mojikumi classes (and JIS X 4051 mojikumi classes) are a grouping of characters according to spacing convention and the need to differentiate spacing rules among characters that are the same semantic type, e.g. U+FF08( and U+300C「. Both those characters are broadly categorized as Opening Punctuation, but the spacing rules can differ, so in mojikumi classifications they are distinct. I am not sure if the intent of this proposal is to introduce such granularity into the Unicode Character Property just for the sake of supporting Japanese publishing spacing rules, but if not, then I think conversion to using them in JLReq will be a lossy conversion. Unicode unification of punctuation and certain Latin and Cyrillic and Greek characters to one code point, whereas historically in Japanese fonts such characters were distinct (and their encoding in SJIS distinct from that in ASCII), has caused a similar lossy problem when composing text in various Japanese fonts of different vintages. Some fonts have U+201C ” as a full-width SJIS-like glyph, others treat that codepoint as proportional, and the mojikumi spacing rules are different (the classes are different), yet cannot be expressed in Unicode alone.
Before modifying the jlreq text, we may need to be clear if our plan is to modify the existing jlreq, or to rewrite jlreq to be digital-oriented and international-oriented and use the new Unicode based definition here. If the latter is the goal, what is the general structure of the new document? (Maybe a separate issue is needed.)
Before modifying the jlreq text, we may need to be clear if our plan is to modify the existing jlreq, or to rewrite jlreq to be digital-oriented and international-oriented and use the new Unicode based definition here.
For this we target to rewrite JLreq as international-oriented definitions, also at least this modification need to be considered as next-edition (or amendment at least). For digital-oriented,, some sort of items are related but there is no solid (or even rough) idea for now.
If the latter is the goal, what is the general structure of the new document? (Maybe a separate issue is needed.)
For this META issue, no large restructure is in plan. We suppose this will introduce modification to some (sub-)sections of main text and appendix.
It is rather a complex task as Shimono-san outlined. JLReq's character class is a combination of static property of characters and the context, where the character is used. We need to separate the context from the static property. It is a major architectural conversion which requires many rewrites.
I agree that one way we can make the classifications of characters in JLReq more compatible with those in Unicode is to separate the static property so that the static properties necessary for Japanese layout can be expressed in more universal (or Unicode-compatible) terms. This would seem to mean the Unicode terms should be expanded to include such nuances for Japanese, and then for other languages with specific or unique layout rules as well.
As to revising the description of how the contextual nature of Japanese layout rules work, that would seem something that can be expressed in JLReq similarly to how they are already for traditional printing and book typography. We can expand their scope into dynamic digital layout, expressing to an international audience what informs the practices of experts of Japanese layout in any medium, for example, the role of white space between characters, between lines, and flowing around objects as it relates to text.
I admit I have not read this w/ enough detail, but skimming this discussion it occurred to me that the problem is similar on a meta level to the Unicode IndicSyllabicCategories and IndicPositionalCategories, and of course the Unicode Vertical text properties.
How about proposing a set of categories to Unicode, with some defined as "derived" (some algorithmic combination of existing Unicode properties) and some explicitly assigned as a - more fine grained - override).
Done that way, the end result would be a reliance on formal Unicode properties, but also, inside Unicode, the established derivations would surface any changes that might be (inadvertently) introduced by changes in the underlying Unicode properties (like general category or line break). If such properties must change in Unicode for some reason, it would be possible to adjust the derivation or attach an override to keep the layout properties unchanged. On the contrary, if/when the layout properties need to be changed/corrected, that can be done by changing a derivation, changing and override or changing an underlying Unicode property (if appropriate).
Getting this done may require that a Unicode technical report draft is created that defines the relation between standard Unicode properties and the (partially derived) layout properties.
@asmusf It is indeed a good suggestion for the task force to consider (although strictly speaking it's out of the scope of "layout requirements"), and there are similar suggestions in other issues as well (see the "Classes as a Unicode property" section in #242). I think we can discuss this idea in future JLReq meetings (and/or in GitHub).
existing issues:
During Jan 2020 JL-TF F2F meeting, reorganization and upgrade of character class were chatted (email in Japanese). After a while, we have developed a list of possible discussion and research items as below. This issue is a META issue to track activities (incl. sub-issues) and possible action items. Pointers to discussions and important inputs (or summaries) are added to the bottom of this initial comment. During coordination, it was pointed that updating JLreq WG Note itself via individual issues/PRs is not a good plan to go forward, but having a separate document for next character class makes our works easier. (detailed plan will be proposed by @kidayasuo ).
JL-TF meetings (agenda, notes, todos)
Inputs in mail list or github