Closed r12a closed 4 years ago
This wasn't present when the mail thread you referred to was open, but now https://drafts.csswg.org/css-text-3/#text-encoding says:
CSS is built on [UNICODE]. UAs that support Unicode must adhere to all normative requirements of the Unicode Core Standard, except where explicitly overridden by CSS.
This means that UAX14 is to be followed, as much as any other part of unicode is. Most parts of UAX14 are tailorable, which is effectively the same as a "SHOULD" requirement. I don't think we should go any further than that, for the reasons argued by @fantasai in the mail thread (note that it already goes further than what she was willing to accept back then)
In other words, at a normative level, the spec already does what you are requesting.
This should be sufficient grounds to file write tests into wpt (with the "should" flag) and file bugs against browsers when they fail those tests, although we should expect that some of these bugs may be closed as WONTFIX when browsers have a good reason to deviate from UAX14.
That said, when doing this, recognizing @kojiishi's point in the thread about web compat, and @fantasai's point about UAX14 being only a baseline that in many cases ought to be tailored, I would recommend focusing on situations which are known to be problematic, rather than just writing exhaustive checklists for all code points: deviating from UAX14 can be justified for web-compat reasons or because of desirable tailorings, but since browser code was historically not based on UAX14, not all differences are documented, and finding out whether a difference is accidental or intentional can be expensive. Bearing that cost in cases known to be problematic is justified, but starting with an assumption that all divergences are bad is likely to result in a lot more work than actually desirable.
On an editorial level, maybe we can make things a little more obvious. How about rephrasing the note at the end of https://drafts.csswg.org/css-text-3/#line-breaking from:
Further information on line breaking conventions can be found in [JLREQ] and [JIS4051] for Japanese, [CLREQ] and [ZHMARK] for Chinese, and in [UAX14] for all scripts in Unicode. See also the Internationalization Working Group’s Typography Index [TYPOGRAPHY] which includes more information on additional languages.
to
[UAX14] defines a baseline behavior for line breaking for all scripts in Unicode, which is expected to be further tailored. Further information on line breaking conventions can be found in [JLREQ] and [JIS4051] for Japanese, [CLREQ] and [ZHMARK] for Chinese. See also the Internationalization Working Group’s Typography Index [TYPOGRAPHY] which includes more information on additional languages.
The i18n WG discussed this during their telecon today, and agreed that we would be satisfied if Florian added the proposed edit. Thanks.
Moving an old thread on the www-style list to GitHub to keep it visible. Please read that thread before replying here.
The spec defines behaviour for several UAX14 line-break classes, such as IN and PR in CJK text, and WJ, ZW, and GL in general text. But there's no recommendation in the spec (other than a note which simply says that further information can be found for all scripts in UAX14) that says that in the absence of other rules in the spec or in an implementation, then the UAX14 line-break properties should be applied as a fallback.
The i18n WG believes that recommending this (as a fallback only, like the UBA for sorting when no tailoring is available) will:
We encourage tailoring of the rules for particular contexts, and propose this only as a kind of 'safety net'.