w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.5k stars 661 forks source link

[css-text-4] text-spacing needs to handle non-fullwidth punctuation also #6091

Open fantasai opened 3 years ago

fantasai commented 3 years ago

“京都(日本)” This doesn't collapse between the closing parens and the closing quote, but it really ought to. (Note that curly quotes are used in Chinese, and are rendered fullwidth even though they use the same codepoint.)

xfq commented 3 years ago

See clreq § 3.1.6.1 and § 3.1.6.2 for requirements for Chinese layout.

macnmm commented 3 years ago

JLReq TF is currently discussing this class of issues. Font information is needed to augment the spacing rules of Unicode code points when the code point could be full-width or not depending on the font, and the collapse of extra space to achieve a zero-point is a necessary first step of correct glyph spacing adjustment (the second being to selectively restore some spacing).

fantasai commented 2 years ago

The single and double curly quotation marks are explicitly called out in the spec already, because of their usage in Chinese. So this issue is focused on other punctuation such as non-fullwidth brackets and guillmots etc. E.g. {京都(日本)}

kidayasuo commented 2 years ago

We've discussed this at the JLReq TF meeting and developed a note (in Japanese). In sum, when a proportional opening bracket is placed before a fullwidth opening bracket (cl-01 in JLReq), and when a proportional closing bracket is placed after a fullwidth closing bracket or a fullwidth fullstop & comma (cl-02/06/07), the extra space within the fullwidth character should be removed.

e.g. these cases [「 」] 。]

himorin commented 1 year ago

We've discussed this at the JLReq TF meeting and developed a note (in Japanese). In sum, when a proportional opening bracket is placed before a fullwidth opening bracket (cl-01 in JLReq), and when a proportional closing bracket is placed after a fullwidth closing bracket or a fullwidth fullstop & comma (cl-02/06/07), the extra space within the fullwidth character should be removed.

JL-TF finally agreed to update jlreq(-d) spacing property document (one defined using Unicode property is under development), based on above direction. This update will cover all of pairs between non-fullwidth punctuation (like in Latin script, and expanded to whole Unicode) and fullwidth punctuation (defined by EAW is F or W, and which were included in definition of JLreq), as:

for more, see e.g.: https://github.com/w3c/jlreq/issues/340#issuecomment-1302021065

frivoal commented 1 year ago

Added a commit that should handle non-CJK "proportional" bracketing punctuation: cc2ee4cf8

It does not handle non-CJK stops and commas like ). (full width closing parenthesis followed by ASCII period) or 」, (full width right corner bracket followed by ASCII comma), or equivalent things in other writing systems.

Should it? If so, how do we identify the relevant set(s) of characters?

It also doesn't handle ambiguous punctuation from the Pf / Pi categories (like ‘ ’ or « ») other than double curly quotes which are already special-cased due to Chinese. That's probably unavoidable though.

xfq commented 1 year ago

The CLReq TF discussed this issue and concluded that the spacing around non-fullwidth punctuations should not be changed in Chinese, because these are not part of Chinese punctuation and should be handled according to the rules of Western punctuations.

We'd be happy to continue discussing a possible solution with the JLReq TF and the CSSWG, for example, different behaviour depending on the language?

MurakamiShinyu commented 1 year ago

I understand that some Chinese fonts such as "Microsoft JhengHei" have fullwidth brackets center-aligned in the full width, and that cause problem if trimmed with adjacent non-fullwidth punctuation.

How about Korean text layout? In Korean texts, non-fullwidth punctuations are used primarily, and also fullwidth brackets are used occasionally. So I thought that the text-spacing handling of adjacent fullwidth and non-fullwidth punctuations might be important for Korean text.

I checked Korean Wikipedia articles that contain adjacent fullwidth and non-fullwidth punctuations.

sample1

… 잉글랜드 유복한 집안에서 태어나 런던으로 이주하고서 본격 작품 활동을 시작하여 일약 명성을 얻었고, 생전에 '영국 최고의 극작가' 지위에 올랐다. 《로미오와 줄리엣》, 《햄릿》처럼 인간 내면을 통찰한 걸작을 남겼으며, …

sample2

… 로버트 듀발, 리처드 해리스 주연의 1993년 영화 《월터와 프랭크》(원제 《헤밍웨이와 레슬링하기》)는 플로리다의 해안 마을에서 은퇴한 두 친구의 우정을 다루었다. …
  • Hemingway, Ernest (2013). 《헤밍웨이 단편선(1~2 합본)》. 번역 김욱동. …
  • Hemingway, Ernest (2013) [1932]. 《오후의 죽음》. 번역 장왕록. …

These examples use (U+300A LEFT DOUBLE ANGLE BRACKET) and (U+300B RIGHT DOUBLE ANGLE BRACKET) which are East_Asian_Width=Wide, but interestingly these fullwidth brackets appear to be half width in my browser's default Korean font setting (Korean font: "Apple SD Gothic Neo").

Screenshots, with Korean font "Apple SD Gothic Neo":

ko-sample1-AppleSDGothicNeo ko-sample2a-AppleSDGothicNeo ko-sample2b-AppleSDGothicNeo

If such Korean fonts are very common, text-spacing-trim would not be necessary for Korean text. However, I found that other Korean fonts, such as AppleGothic, AppleMyungjo, Noto Sans KR, or Source Han Sans K are not the case.

Screenshots, with Korean font "AppleGothic":

ko-sample1-AppleGothic ko-sample2a-AppleGothic ko-sample2b-AppleGothic

In these results, the spacing of adjacent fullwidth and non-fullwidth punctuations looks not very nice.

We will need to hear from Korean text layout experts.

MurakamiShinyu commented 1 year ago

For the example case above, text-spacing: trim-all may be a better option:

xfq commented 1 year ago

The CLReq TF discussed this issue and concluded that the spacing around non-fullwidth punctuations should not be changed in Chinese, because these are not part of Chinese punctuation and should be handled according to the rules of Western punctuations.

We discussed this issue again during the I18N ⇔ CSS Call and another CLReq Editors' Call, and we're OK with trimming the spacing around non-fullwidth punctuations in Chinese (i.e., the current JLReq/CSSWG consensus).