w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.43k stars 656 forks source link

[css-text] The definition of ideographs includes punctuation marks #9501

Open xfq opened 10 months ago

xfq commented 10 months ago

https://drafts.csswg.org/css-text-4/#ideographs

The current definition of ideographs includes two Chinese punctuation marks, U+16FE2 and U+16FE3 (see the UnicodeSet), but no other punctuation marks. These characters are not used in modern Chinese, but they are used by scholars for the textual processing, electronic interchange, and publication of ancient Chinese texts.

As an example, if the end of a section/clause/sentence is Western text, then the spacing between the Western text and the U+16FE2 punctuation will become larger per text-autospace, which is not the result users want to see.

There should be some examples of classical Chinese mixed with Western texts in the early days of Republic China, but it will take some time to find specific examples.

fantasai commented 10 months ago

So probably the definition of "ideographs" should include only Letters, Numbers, Symbols, and Marks (excluding Punctuation, Spaces, and Control characters). Does that sound right?

frivoal commented 10 months ago

@xfq https://github.com/w3c/csswg-drafts/pull/9503 tries to address this (and a some other things), can you have a look?

Clqsin45 commented 10 months ago

A similar case might be using Bopomofo directly in text ( non-annotation usage).

I think according to https://www.w3.org/International/clreq/#chinese_and_western_mixed_text_composition, no spacing should be inserted between Bopomofo and Chinese characters, as Bopomofo should be non-western characters, but the current version is asking to do so, and it should be addressed by the PR.(?)

frivoal commented 10 months ago

@Clqsin45 I've posted an answer to your point above into https://github.com/w3c/csswg-drafts/issues/9471 This is a related topic, but I felt it might be better to keep this issue about punctuation, and that issue about classes of letters we might have put in the wrong category