Review Segment Break Transformation Rules (CSS Text Level 3)

w3c / jlreq

Text Layout Requirements for Japanese

https://w3c.github.io/jlreq/

Other

101 stars 17 forks source link

Review Segment Break Transformation Rules (CSS Text Level 3) #211

Open kidayasuo opened 4 years ago

kidayasuo commented 4 years ago

There are discussions in CSS WG regarding Segment Break Transformation Rules:

https://drafts.csswg.org/css-text-3/#line-break-transform (what it is and the rule)
https://drafts.csswg.org/css-text-3/#space-discard-set (current charset)
https://github.com/w3c/csswg-drafts/issues/337 (what do do with EAW=Ambiguous? closed)
https://github.com/w3c/csswg-drafts/issues/4992 (how about enclosed ideographic?)
https://github.com/w3c/clreq/issues/293 (corresponding issue in clreq. good illustration of what the Segment Break Transformation Rules is)
https://www.w3.org/TR/jlreq/ja/#character_classes (JLReq charsets)

We would like to review the rule to see if there are any remaining issues or areas which need discussions.

kidayasuo commented 4 years ago

[updated] Updated the data by removing ones that are actually fullwidth versions of the character, and by removing character classes that are inherently non-Japanese (cl-24-cl-27). It makes the list easier to examine.

List of characters listed in JLReq that are not Space Discarding according to https://drafts.csswg.org/css-text-3/#space-discard-set

NOT_SpaceDiscarding_JLReq_char.txt

xfq commented 4 years ago

There's also https://github.com/w3c/csswg-drafts/issues/5017 , which is the new CSS issue for "ambiguous" characters.

kojiishi commented 4 years ago

The list is very much helpful, thank you very much, @kidayasuo! It looks to me that the list is reasonable; i.e., the current set of space-discarding unicode characters is reasonable from JLREQ perspective. /cc @fantasai

kidayasuo commented 4 years ago

A basic, but fundamental question. How much we can expect authors or editor software, if they fold line automatically, to corporate? In one extreme, we could say to CJ authors to fold lines only between two Kanjis. then we do not need any other rules than "the segment break transformation rule will not insert a space between two Kanjis". (also, probably these expectations should be documented)

kojiishi commented 4 years ago

A basic, but fundamental question...

I think that is exactly where this is controversial. I'm in favor of making rules as simple as possible, because no matter what we do, authors must remember all the rules, and adopt to it. @r12a seems to have similar opinion if I understand his comment correctly. I see some people arguing more rules can make it smarter. I agree they help some cases but authors must remember more.

kidayasuo commented 4 years ago

Thank you. I agree we should make the rule easier to remember, in another word intuitive. It also needs to be reliable and in that sense I am not so much fond of language tagging idea because it is more prone to errors.

One little caution is that, in general, things that look simpler for human and easier to remember does not necessarily match something that is simple for rule makers. I think we should strive to devise a "smart" rules that feels simpler to people or our users.