[css-fonts] Font fallback for (Unicode) decomposable characters is browser-dependent

spencer246 commented 4 months ago

It seems that CSS Font Module does not fully specifies how browsers should select an appropriate font for a grapheme if (1) a grapheme consists of a single Unicode codepoint X, (2) X is canonically decomposable into codepoint Y, and (3) the font can render only Y but not X.

Note that the condition that a grapheme consists of a single codepoint is important here, because Section 5.3 of the spec mandates that if a grapheme was a multiple-codepoint sequence whose NFC normalization is Y, browsers must check whether the font can render Y before they move on to the next font in the font-family list.

However, it remains unclear whether the rule in Section 5.3 should be applied as well in the case where a codepoint does not belong to a multi-codepoint grapheme cluster or a Unicode variation sequence. In fact, Chrome and Firefox do not agree on this issue; the two browsers render the following simple HTML+CSS snippet differently.

https://codepen.io/spencer246/pen/bGPdqdQ

The above page tries to render U+F992, a CJK-Compatibility character which canonically decomposes into U+6F23 using Noto Sans TC. There are a lot of fonts that cover U+6F23 but not U+F992, and Noto Sans TC is one of such fonts.

In the above figure, the first glyph is U+F992 and the second is U+6F23.

On FireFox, since Noto Sans TC cannot render U+F992, it renders it with the next font (text-security-circle) in the font stack, which renders any codepoint as a small circle.

On Chrome, however, when the engine notices that Noto Sans TC cannot render U+F992, it checks whether it can render the canonically equivalent codepoint U+6F23, and thus U+F992 is rendered as a CJK Ideograph rather than a small circle.

Is this browser-dependent behavior regarding the font fallback algorithm already-known and admissible?

2-1. If it is, the spec should be explicit about its behavior as to how a font is selected for canonically decomposable Unicode characters.

2-2. If it is not, please consider specifying a desired behavior. In my opinion, FireFox-like behavior is desired to match with the variation sequence case:

For sequences containing variation selectors, which indicate the precise glyph to be used for a given character, user agents always attempt installed font fallback to find the appropriate glyph before using the default glyph of the base character.

To be consistent with the above, a canonically decomposable character (e. g. a CJK Compatibility Ideograph) should be matched against all fonts in the font-family list before NFC or NFD is applied to it.

svgeesus commented 4 months ago

I agree that the specification does not cover this case of canonical equivalents between compatibility characters and their equivalent codepoints or codepoint sequences. In particular, the compatibility characters are likely to be outside the effective character map.

We should define this case in the spec.

@jfkthame @drott thoughts?

kidayasuo commented 1 month ago

We discussed this at the JLReq TF meeting on 2024-9-25.

Ideographic characters in the compatibility area are typically used to precisely spell proper nouns, such as 高田 vs. 髙田. This is similar to the spelling variation between ‘Smith’ and ‘Smithe’ in English family names. We now have IVS, a better mechanism to express such variations. However, not only do we need to continue supporting existing data, but it is also likely that compatibility ideographs will continue to be used.

For this reason, the JLReq TF believes that compatibility ideographs should be treated similarly to variation sequences in order to preserve the intended variations.

w3c / csswg-drafts

[css-fonts] Font fallback for (Unicode) decomposable characters is browser-dependent #10565