[css-ruby-1] Should auto-hide match use NFKC and/or strip white space?

fantasai commented 3 years ago

https://lists.w3.org/Archives/Public/www-style/2016Dec/0108.html raised some use cases for ruby auto-hiding other than strict string equality. Many of the examples would require custom rules (which could be done manually with visibility: collapse), but some of these could be automatically solved by stripping white space and/or matching via NFKC. Should we enable such normalization for auto-hiding string comparison?

patrickdark commented 3 years ago

I expanded upon the cited email at https://github.com/w3c/csswg-drafts/issues/5927#issuecomment-779562023.

Even though it wouldn't solve all of the cited issues, I think Unicode normalization is probably a good idea. It would address the fullwidth versus normal width punctuation cases between CJK and Latin languages where I would expect a match (and autohiding). I also would expect Hangul characters built from component characters (Hangul Jamo) to match the precomposed versions, though I'm not sure where that would occur in practice.

It would also allow matching in a weird case I ran into where I deliberately used a combining diacritic plus base letter variant of a character instead of the precomposed diacritic-plus-letter version because the latter was missing in a designer font I was using, so the former looked better. In this case, while I was using ruby, I didn't have to match anything, but would nevertheless expect a match in a comparison to the precomposed character.

fantasai commented 1 year ago

I can't find the minutes, so maybe @r12a or @aphillips or @frivoal can confirm if this discussion I remember actually happened. :) But IIRC the i18nWG concluded that NFKC would be too aggressive in at least some cases, but wanted to know if the CSSWG would consider NFC and/or ignoring white space.

aphillips commented 1 year ago

@fantasai It's here in our TPAC minutes--almost exactly as you remember it :-). Search for the word "hide" and the conversation proceeds from there.

css-meeting-bot commented 1 year ago

The CSS Working Group just discussed [css-ruby-1] Should auto-hide match use NFKC and/or strip white space?, and agreed to the following:

RESOLVED: only perform whitespace stripping before comparing the base and annotation texts

The full IRC log of that discussion

<fremy> fantasai: we have a feature in ruby where if the annotated text and the base are identical if they are presented on top of each other
<fremy> fantasai: but if they are side by side, they are kept for example
<fremy> fantasai: the question is "what is identical"?
<fremy> fantasai: should we normalize before doing this?
<fremy> fantasai: should we deal with white space
<fremy> fantasai: should we collapse unicode characters that merge in rendering if possible? (NFKC)
<fremy> fantasai: but the internationalization group thought it might be too aggressive in some cases
<fremy> fantasai: they recommended NFC instead
<TabAtkins> q+
<fremy> fantasai: which only deal with things that are simpler (e.g. A + an accent vs A accent)
<florian> q+
<fremy> fantasai: so, do we want to perform NFC before comparing the texts?
<astearns> ack TabAtkins
<fremy> TabAtkins: I support whitespace stripping
<fremy> TabAtkins: because it can be due to source code formatting
<fremy> TabAtkins: but I don't think we should do NFC because we don't do this elsewhere
<fremy> TabAtkins: I expect that authors use the same typing convention in the same markup
<fremy> TabAtkins: we are not comparing html vs css
<astearns> ack florian
<fremy> florian: I agree about whitespace
<fremy> florian: for normalization, I'm less sure
<fremy> florian: if one persons types the text, and another the annotations
<fremy> florian: NFC is not very aggressive, I think it would make things more rational for users
<fremy> florian: however, it will be rare I think
<fremy> florian: but if it did occur, I think the correct behavior is to normalize
<fremy> florian: (so, preference for NFC, but not strong)
<jfkthame> +1 to nfc
<fremy> astearns: can we resolve on stripping whitespace, and leave off normalization?
<heycam> q+
<fremy> fantasai: I think yes, I agree with TabAtkins, we don't do it elsewhere
<fremy> fantasai: so it seems ok to drop this
<astearns> ack heycam
<fremy> heycam: this is just a content check, correct?
<fremy> heycam: we don't look at display:none etc... ?
<fremy> fantasai: we might be looking at display:none?
<fantasai> s/TabAtkins, we don't do it elsewhere/TabAtkins and Florian: it's definitely the right thing to do, but it's also not done elsewhere in the platform and is quite rare to mismatch/
<fremy> florian: but not generated content etc
<astearns> jfkthame: would you be OK not doing NFC, or would you prefer we resolve to use NFC?
<fremy> heycam: okay, hopefully the spec is very clear on that
<fremy> astearns: reading IRC comments
<fantasai> [note: those of us on the call are somewhat ambivalent about NFC, given pros and cons]
<jfkthame> astearns: I'd be ok with not, though I think it's less good (sorry, in another meeting)
<heycam> (I kind of don't quite understand the need for this automatic hiding, and why the author doesn't use visibility:hidden on ruby text that they know is the same as the base text)
<fremy> astearns: okay, since we have lots of doubts on NFC, let's just do whitespace and leave if at that
<fantasai> heycam, it's because whether it should be invisible or not depends on how it's styled
<fremy> florian: and also put an action on me to clarify the display:none behavior
<fantasai> heycam, and there's no selector for "this is the same text as the other thing" :)
<heycam> ok
<fantasai> heycam, plus it's what you want by default so we should do it by default
<fremy> astearns: so, the proposed resolution would be to only perform whitespace stripping before comparing the base and annotation texts
<fremy> astearns: any objection?
<fremy> RESOLVED: only perform whitespace stripping before comparing the base and annotation texts
<heycam> text-transform? :o
<fantasai> ”The content comparison for auto-hiding takes place prior to white space collapsing (white-space) and text transformation (text-transform) and ignores elements (considers only the textContent of the boxes).
<fremy> ACTION florian: make sure the way to determine what text we are talking about (display:none, etc...)
<fantasai> ”

aphillips commented 1 year ago

(responding to the IRC log discussion in the comment above)

Note that I18N spent a long time creating a document about string matching Charmod-Norm. When specifying string matching or when considering what to apply, consider referencing the best practices found there. In particular, I18N recommends against performing Unicode normalization for most matching regimes. I think our previous half-hearted recommendation to look at NFC for ruby base matching came out of a TPAC discussion in which NFKC was being considered. But upon reflection, if the base and ruby text were not encoded the same except under NFC, treating them as different would be unsurprising (and represents a pretty rare corner case in any event--the only case that springs to mind might be the handling of dakuten marks in Japanese, which are sometimes combining, but even then the difference might be intentional??)

w3c / csswg-drafts

[css-ruby-1] Should auto-hide match use NFKC and/or strip white space? #5995