w3c / clreq

Requirements for Chinese Text Layout
https://www.w3.org/International/clreq/
Other
727 stars 61 forks source link

UAX #14 for line-breaking with quotation marks #245

Open r12a opened 4 years ago

r12a commented 4 years ago

In Chinese, is an opening quotation mark, but in some languages (like German) it can be a closing quotation mark. In UAX #14:

LB19 Do not break before or after quotation marks, such as ‘ ” ’.

This is too strict for Chinese, because breaks before should be allowed (it can appear at the line start).

Specs: In the description for QU in UAX #14:

Some quotation characters can be opening or closing, or even both, depending on usage. The default is to treat them as both opening and closing. [...] Note: If language information is available, it can be used to determine which character is used as the opening quote and which as the closing quote. See the information in Section 6.2, General Punctuation, in [Unicode]. In such a case, the quotation marks could be tailored to either OP or CL depending on their actual usage.

And css-text also allows the UA to determine the set of line-breaking restrictions to use, so it allows for this kind of tailoring.

Tests & results: Interactive test, In Chinese, breaks before “ are allowed.

Bug report: ICU

r12a commented 4 years ago

The first comment in this issue contains text that will automatically appear in the Chinese gap-analysis document as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

xfq commented 4 years ago

Related issues:

frivoal commented 4 years ago

Line breaking and prohibitions in CSS is supposed to be language senstive. So it would be perfectly reasonable for UAs to allow-before/forbid-after if lang=zh but allow-after/forbid-before if lang=de. Most of UAX-14 is "should" level as well, so it allows for this kind of tailoring.

With that said, the remaining question is whether this should be left for individual UAs to figure out, or if it should be specified somewhere. The general feeling in the CSS WG has been that specifying this sort of thing is just too big a scope, involves too much original research as soon as we step off the major languages (and even there, it might need non-trivial work to figure out what's desired), and that specifying this kind of thing in css-text is just not practical.

xfq commented 4 years ago

In the description for QU:

Some quotation characters can be opening or closing, or even both, depending on usage. The default is to treat them as both opening and closing. [...] Note: If language information is available, it can be used to determine which character is used as the opening quote and which as the closing quote. See the information in Section 6.2, General Punctuation, in [Unicode]. In such a case, the quotation marks could be tailored to either OP or CL depending on their actual usage.

And css-text also allows the UA to determine the set of line-breaking restrictions to use, so it allows for this kind of tailoring indeed.

Testing in Chrome, Firefox, and Safari, it seems that browsers already treat the quotation marks as OP and CL when the text is in Chinese (try changing the lang to de and see the result for comparison). I think we can close this issue, if this rule does not need to be added to the css line-break rule (I agree that specifying this kind of thing in css-text is not practical).

fantasai commented 4 years ago

I'd probably go further and say that breaks should be allowed between ID + Pi and between Pf + ID unless language information indicates otherwise. The UAX14 rule controlling this is workable in space-separated languages, but it doesn't work very well as a default when there are no spaces.

fantasai commented 4 years ago

Reported to Unicode, with a link back to this issue.