w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.47k stars 658 forks source link

[css-text-3] Should enclosed counting rods / tai xuan jing / yi jing hexagrams be space-discarding? #4993

Closed fantasai closed 4 years ago

fantasai commented 4 years ago

In #337 we decided to key line-break transformation behavior by Unicode Block. Most of the blocks are pretty straightforward: Han, Kana, Yi, and CJK punctuation blokcs discard, and everything else converts to a space. But there are a few interesting cases...

One interesting case are some symbols that seem to originate primarily in CJK usage: https://en.wikipedia.org/wiki/Yijing_Hexagram_Symbols_(Unicode_block) https://en.wikipedia.org/wiki/Taixuanjing https://en.wikipedia.org/wiki/Counting_Rod_Numerals_(Unicode_block)

Our intent is to discard if it's safe to do so (Chinese / Japanese context) but not otherwise (Korean, English, etc.). Note that we only discard if both sides (before and after) the line break are part of the space-discarding character set.

What should we do with these blocks?

kojiishi commented 4 years ago

From the example pictures submitted to Unicode, none of them use spaces to delimit words, so I prefer to include, but I'm fine not to if others think so.

dscorbett commented 4 years ago

Is it worth keeping the hexagrams’ behavior consistent with the monograms’ and trigrams’ in Miscellaneous Symbols?

r12a commented 4 years ago

discard if it's safe to do so (Chinese / Japanese context) but not otherwise (Korean, English, etc.).

I wasn't able to find the text in https://drafts.csswg.org/css-text-3/#line-break-transform that indicates how the browser determines whether it's in a CJ context or not.

My current thinking is that it will be important to identify language settings before applying the discard rules.

r12a commented 4 years ago

For example, the counting rods block also contains Western tally marks, and it may be better to keep spaces between those if they appear on either side of a line break in English content.

frivoal commented 4 years ago

I wasn't able to find the text in https://drafts.csswg.org/css-text-3/#line-break-transform that indicates how the browser determines whether it's in a CJ context or not.

For now, it doesn't. It could be change to take the lang attribute into account if we wanted to introduce some notion of a language dependent context.

fantasai commented 4 years ago

@r12a @frivoal I think the CSSWG wanted to avoid introducing language-dependency for the space-discarding rules.

My take on this, based on @dscorbett’s comment, is to exclude these characters from the space-discarding set. Based on that I propose to close this issue as no change.

kojiishi commented 4 years ago

Looks good to me.

css-meeting-bot commented 4 years ago

The CSS Working Group just discussed [css-text-3] Should enclosed counting rods / tai xuan jing / yi jing hexagrams be space-discarding?, and agreed to the following:

The full IRC log of that discussion <dael> Topic: [css-text-3] Should enclosed counting rods / tai xuan jing / yi jing hexagrams be space-discarding?
<dael> github: https://github.com/w3c/csswg-drafts/issues/4993#issuecomment-633723924
<dael> fantasai: Line breaks between these character categories are dropped. Do we include these symbols in that set? Prop in issue is no
<dael> fantasai: Reason is to keep hexagrams consistent with misc symbols block. koji and I think this is good idea, checking with WG. Prop: close no change
<dael> astearns: Richard's opinion?
<dael> fantasai: Mentioned countring rods might be used in western context so keeping space is better idea. That's in favor of no change
<dael> astearns: Other comments?
<dael> astearns: Prop: Close no change to current spec
<florian> +1
<dael> astearns: Anything clarifying?
<dael> fantasai: No, it's an explicit list of codepoints
<dael> RESOLVED: Close no change