w3c / font-text-cg

GitHub Pages
https://w3c.github.io/font-text-cg/
Other
28 stars 5 forks source link

Font metrics to report blanks in CJK punctuation #45

Open frivoal opened 3 years ago

frivoal commented 3 years ago

Is there in opentype (and related technologies) some metric that lets you know for CJK characters like , or others that contain a large blank, how much of the characters is blank (and one which side(s) the blank is)?

As a first approximation, we could write a fixed list like (expanded, this is not exhaustive):

However:

So this doesn't seem like something we can hard-code in Unicode (or in a spec that wants to use these things), and that it has to be readable from font metrics.

So, does this information exist already, and if not, can it be created?

I suspect that the halt and vhal features are not what I am looking for, as they fit a fullwidth character into a halfwidth space, which does get you rid of the blank part, but doesn't tell you where that blank part was, nor guarantee that the non-blank part is undistorted by this operation.

Use cases for this would be supporting ruby overhand in the manner described in Simple ruby 3.2 bullet 4, or https://drafts.csswg.org/css-text-4/#fullwidth-collapsing

lianghai commented 3 years ago
  • this varies per font (some center ?, some left-align it leaving a blank right)

Yep, eg, fonts from Founder Type, the largest type foundry in the mainland China, notoriously have their and etc centered.

No, there isn’t a specialized field available there today in the fonts, but dynamically checking side bearings may be easier for everyone anyway. You know how we can’t even trust those metrics in Latin fonts for underlines.

macnmm commented 3 years ago

I have something in mind for this that I have floated with Koji-san in a thread and could write up for a more formal proposal...

Introduce two new OTF features 'jtsu/vjts' and 'jaki/vjak'; deprecate the proposed features 'chws/vchw':

The issue my idea is trying to solve is that currently the JIS X 4051 standard dictates certain full-width punctuation be treated as half-width, but fonts today include many more punctuation characters than are explicitly specified, and also do not agree on the width and design for codepoints in the ambiguous range (e.g. U+2xxx). So, what started as a spec for width adjustments based on codepoint actually require font and glyph info to do correctly. The 'jtsu/vjts' feature would be a way for fonts to specify this adjustment to the JIS X 4051 zero-point, eliminating differences across text engines that hard-code their own behavior or have unique heuristics. I have chosen 'j' as the prefix (and base the 3-char name on Japanese terminology "aki" and "tsume") because this issue is specific to the Japanese JIS standard and its unique scheme.

Having established a way for the JIS X 4051 zero-point to be informed by the font data, there is the issue of varying levels of support for mojikumi aki spacing adjustments away from the JIS X 4051 zero-point. Some engines may only require a basic adjustment of +1/2 em all the time; InDesign requires much more variation and control of when and how much is added. For the basic case, performing the +1/2 em logic in a contexutal way could be done using an OTF feature, hence the introduction of the 'jaki/vjak' feature. Engines will always be different in how they adjust spacing; they could use OTF features or they could have their own more complex logic. I don't think the use of either feature would be inconsistent, only that some apps would opt not to use the second one.

frivoal commented 3 years ago

Actually, I think I was mistaken / confused, and that halt actually is what I am looking for. I think I actually confused it with hwid. That would have the issues I am talking about, but halt seems fine.

But now that @lianghai and @macnmm have agreed there's an issue, I suspect there's more complexity to this problem that I initially thought, but I'm not sure what's missing. @macnmm Could you explain a bit more (we touched about this in the latest JLREQ-TF, but I'm still falling a bit short)?

macnmm commented 3 years ago

Actually, I think I was mistaken / confused, and that halt actually is what I am looking for. I think I actually confused it with hwid. That would have the issues I am talking about, but halt seems fine.

But now that @lianghai and @macnmm have agreed there's an issue, I suspect there's more complexity to this problem that I initially thought, but I'm not sure what's missing. @macnmm Could you explain a bit more (we touched about this in the latest JLREQ-TF, but I'm still falling a bit short)?

I think that if you are wanting all the full width punctuation to be reduced to half-width for layout purposes, halt will do the trick. If you want to know where the space is in the glyph design, using halt to back into it with math seems expensive to me, but it could be done.

My issue is (and I hijacked your issue being on the agenda to discuss it), chws is not usable by sophisticated layout apps, as it basically performs mojikumi-like spacing adjustments on punctuation when they are contiguous, but otherwise does no other adjustment so would be incompatible with other adjustment logic. I seek a feature that can improve upon the issue of mojikumi class being tied to the glyph and not the codepoint, yet we have no standard way of categorizing glyphs for this purpose. I thought we could use halt to achieve the JIS X 4051 zero-point, and from there unsophisticated apps would use a new feature jaki to add back aki spacing in the case the punctuation is adjacent to a glyph of a different class. Sophisticated apps would have their own logic to do this and not use the jaki feature.

However in subsequent discussions it appears chws still has its adherents, for the simplest use cases of single stylerun UI or other text, and the JLReq TF is discussing the need for fonts to include character class info somehow for glyphs, so engines can do sophisticated layout. To what extent should the fonts specify spacing, in a world where spacing is not one-size-fits-all? Perhaps fonts should instead specify class, and engines decide to what extent they support mojikumi...