w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.42k stars 652 forks source link

[css-text][text-autospace] Is halfwidth Kana "non-ideographic letters"? #9471

Open kojiishi opened 10 months ago

kojiishi commented 10 months ago

From the 8.5.2. Text Spacing Character Classes, halfwidth Kana belongs to "non-ideographic letters". This means that there will be an auto-space between ideographs and halfwidth Kana. This seems like an overlook.

The question is whether they should be "ideograph" or not. Terminology-wise, Kana is derived from Han, so it's closer to ideograph than to non-ideograph. Behavior-wise, I don't think authors expect auto-space between ideograph/fullwidth Kana and halfwidth Kana.

So I think we should classify them not to belong to either "ideographs" nor "non-ideographic letters". I'll check with JLTF about this separately.

kojiishi commented 10 months ago

Japanese discussion here

kojiishi commented 10 months ago

3 opinions in the Japanese discussion, all agreed:

frivoal commented 10 months ago

@kojiishi https://github.com/w3c/csswg-drafts/pull/9503 tries to address this (and a some other things), can you have a look?

kojiishi commented 10 months ago

/cc @clqsin45 @nt1m

nt1m commented 10 months ago

cc @fantasai @vitorroriz for WebKit input.

frivoal commented 10 months ago

As pointed out in https://github.com/w3c/csswg-drafts/issues/9501#issuecomment-1779704918, we might have a related question about Bopomofo. By the way, how about Hangul? Maybe it should be treated similarly to Bopomofo? Should these be made to fit in the ideographs category, or non-ideographic letters, or neither like half-width katakana?

What about other east asian scripts, such as:

All these have East_Asian_Width set to 'Wide', but not 'FullWidth'

kojiishi commented 10 months ago

how about Hangul?

@jungshik Please see the comment above. Would Korean readers want 1/8em auto-spacing between Hangul and Han Ideograph?

What about other east asian scripts, such as:

I would prefer not to go too far. This isn't a logical choice but just a pattern used for a long time, so the right answer doesn't exist, and updating the set from future feedback is likely possible.

fantasai commented 9 months ago

@frivoal Khitan, Nushu, and Tangut should be categorized as ideographic. In fact, all Wide characters should be categorized this way. I don't think users would expect spacing between them and Han ideographs.

xfq commented 9 months ago

I agree with Khitan and Tangut (and maybe Jurchen and Classical Yi).

However, Nüshu is vastly different from Han characters, and its ideal character frame is rectangular, with the width of the character less than the height of the character. We will discuss this in the clreq group.

r12a commented 9 months ago

I'm no expert on Khitan Small Script, but i think it is a bit special since characters are arranged in 2-dimensional groups, separated by spaces. Here's a sample (the blue highlight shows a space):

Screenshot 2023-11-14 at 11 57 25

https://r12a.github.io/scripts/samples/index.html?script=kits

xfq commented 9 months ago

The Khitan small script sequences do have spaces between them, and the spaces are there to terminate the words.

The problem with the Noto font, though, is that it deviates from the aesthetic and design of the script. The width of the top part and the bottom part should be the same. The following screenshot is a correct example:

KSS_Vertical_Word

If there is a zero width space or no space between between runs of text in Khitan small script and non-ideographic letters/numerals, the layout engine should add the extra spacing. If there is already a space (U+0020) between the text in Khitan small script and non-ideographic letters/numerals, there is no need for the layout engine to add extra spacing.

xfq commented 8 months ago

FWIW, there is also a proposal in Unicode about this: https://www.unicode.org/L2/L2023/23283-auto-spacing-prop.pdf