w3c / clreq

Requirements for Chinese Text Layout
https://www.w3.org/International/clreq/
Other
705 stars 62 forks source link

Fullwidth is not equivalent to em #511

Open groverlynn opened 1 year ago

groverlynn commented 1 year ago

The definition for em is the total height of the quadrate. Assuming CJK glyphs share the same height, then the width of a CJK wide character is 1 em if and only if the font uses exact square character cells for wide characters. Square character cells are no longer guaranteed for CJK characters now. More and more fonts are using slightly narrow rectangles (e.g. 980/1000 as the width) or slightly wide rectangles for heavier variants. In such cases, the CJK full-width is no longer 1 em, nor is the CJK half-width 1 en.

This raises a series of issues. Firstly, em and full-width should be properly distinguished, so are en and half-width. em and en are units dependent on height (regardless of CJK or non-CJK) and full-width and half-width are dependent on East Asian width which are not only defined for CJK characters and symbols.

Secondly, any non-CJK characters that are defined and described in em or en in Unicode should probably not be recommended to use in CJK context. For instance, the punctuator "dash" (破折號) should be two horizontal bars (U+2015) instead of two em dashes (U+2014) nor one two-em dash (U+2E3A), which is anyways an ellipse rather than a dash.

Besides the width problem, those characters are meant to follow the alphabetic baseline which is not the same as the ideographic baseline, causing within-cell vertical alignment problems. Sharing alphabetic punctuators with ideographic characters should never be the first choice for typesetting reasons.

realfish commented 1 year ago

A related comment: https://github.com/w3c/clreq/pull/466#issuecomment-1133584925. FYI.

groverlynn commented 1 year ago

A related comment: #466 (comment). FYI.

@realfish You may say it's related, but I advocate for the exact opposite. Given the rising popularity of non-square CJK fonts, including the Mac's system default fonts and Adobe's popular Source Han series (a.k.a. Google's Noto series), em and en should not be used in CJK typesetting to mean full-/half-width.

AmeroHan commented 1 year ago

As conventionally em differs from what we intend to mean in this document, and this document is not just oriented to Chinese users, I think there is a need to explain “em”, or replace it by other words.

A CSS unit in CSS Values and Units Module Level 4 is worth mentioning here, which may help.

ic

Represents the typical advance measure of CJK letters, and measured as the used advance measure of the “水” (CJK water ideograph, U+6C34) glyph found in the font used to render it.

acli commented 1 year ago

CJK glyphs have never been guaranteed to be square, even way back in dot matrix days. Back then, width : height could easily have been closer to 2:1 (when printing in horizontal mode).

Em and CJK space are identical only when typesetting vertically. I’ve been saying this since a long time ago.

macnmm commented 1 year ago

The design space of the font, defined in UPM, has been equivalent to the CJK embox (the design space used for ideographs, etc) in the case of square fonts. It is true that now the font design space is not necessarily the same dimension or aspect ratio of the CJK embox. Therefore, it is necessary for font developers to make use of the BASE table entries for the ideographic metrics, so the layout software can align glyphs to the correct embox metric (e.g. embox center alignment, or aligning square CJK fonts next to non-square fonts on the same line).

macnmm commented 1 year ago

This means that the UPM design space (normally the same height as the point size of the font) can differ from the CJK embox (c.f. some squat-looking Chinese fonts, or in the case of variable fonts that change both height and width). It is necessary for layout software to support alignment of glyphs on different baselines, not only on the glyph origin (Roman baseline). We will be explaining these issues more in the JLReq so support of non-square or variable CJK fonts can be complete.

macnmm commented 1 year ago

Narrow or wide CJK fonts also pose an issue when adjusting the space in terms of "fullwidth" percentages -- normally this is assumed to be a percentage of the point size or of the em, but as you point out this is not the case. Each monospaced CJK font has its own measurement of "fullwidth", and the percentage adjustments should be based on that width.

xfq commented 8 months ago

See also https://github.com/w3c/jlreq-d/issues/35