jcitpc / CJKFont

0 stars 0 forks source link

Definition of “mono-spaced CJK” #2

Open kidayasuo opened 11 months ago

kidayasuo commented 11 months ago

The palt_kern proposal needs a better definition for the term “mono-spaced CJK” that it uses.

It seems there are two ways of defining it. One is to use script property in UCD. The other is to somehow determine if the specific font is mono-spaced (and then for which code points?). However the earlier method turn off kern even for truly proportional CJK fonts, it might be more efficient and can be more consistent for users. What do you think?

Below is a discussion copied from Teams:

Yasuo Kida

Nat McCully (Guest)さん、or 他の方、もう一点質問です。提案の中に "scripts other than mono-spaced CJK", "shall not be activated on monospaced (e.g. CJK script) glyphs" と言う言葉が出てきますが、この mono-spaced CJK フォントであるかどうかを見分ける方法はありますか?

Takaaki Fuji

読んでいて、少し気になったので、質問させてください。Nat McCully (Guest) さんが monospaced glyphs と呼んでいるのは、日本語で「InDesign の和文等幅グリフ」と呼ぶものに相当するでしょうか? だとすれば、誰が規定する「幅」なのか、少しわかりづらい印象を受けました。アプリが抱える EastAsianWidth.txt のようなテーブルで決まるのか? 各フォントの hmtx/vmtx を元に決まるのか? という点です。私は前者として読んだので、kida (Guest) さんが monospace CJK fonts というワードを出していらっしゃる一方、後者の hmtx/vmtx とは関係ないものと考えました。どうでしょうか? と、興味本位で、csswg-drafts の #6723 で村上さんが紹介されていた Gecko のパッチを斜め読みしてみました。Firefox では text run を UCD の書記体系ごとに分割 (itemize) して、その run の書記体系が CJK に属するものであれば、デフォルトでは kern を無効にする — そうすると font-kerning: auto の場合に InDesign の「和文等幅」相当が結果的に実現できますよ、というものに読めました。InDesign の「和文等幅」のメンタルモデルでは、グリフ単位で「CJK の文字幅」を判別することが palt の適用可否を判定する上でまず最初に重要になる、ひいてはその属性を kern の適用可否にも利用して...となっている? というのが、いち利用者としての推測です。一方 Gecko のモデルが、文字幅の概念を導入せず script boundary で kern の ON/OFF を制御、となっているのだとすれば、処理単位をもっと大きな塊として捉えているように思えます。 ということから、素人考えなのですが、kern の script/language sensitivity については、仮に by default shall not be activated on glyphs for CJK-specific scripts... のようにすることで、「幅」への言及を避けつつ「アプリ側の責任」をシンプルに説明できたりしないだろうか、と感じました。ただし、フォントに実装された「ひらがな」や「カタカナ」がデフォルトでプロポーショナル (e.g. P 明朝, UI ゴシック) のようなフォント側の事情も考慮するなら except the case any of the glyphs in each script are designed as proportional みたいに踏み込んで規定してしまう余地もあるのかな、とも思いました。 以上、個人的なコメント、失礼しました。もし内容に誤解がありましたら、すみません。

Yasuo Kida

Takaaki Fuji (ゲスト)さん、スクリプトで分けるのか、文字幅を見るのか、の二つの方法があるということですね。どちらが良いのでしょう。

ただ、最後に付け加えられたように、和字が等幅ではない日本語フォントをサポートしようとすると、結局はスクリプトでは判定できず、間接的にでも文字幅を見る必要があるという結論になりますか?

Takaaki Fuji

Adobe さんの立場でも Mozilla さんの立場でもなく、野次馬でしかないので、非常にお答えしづらいのですが、以下、個人の意見です。 画像 a. 文字の幅を見る     a1. アプリが抱えているコードポイント/CID に関する知識を利用して「CJK 全角文字」かどうかを判断する (フォントを見ない)     a2. 各フォントのグリフの字送り幅から「CJK 全角文字」かどうかを判断する b. スクリプトで分ける ​ 細かく分けて 3 種類はありそうかな? という内容でした。が「和文等幅」の挙動を達成するということなら、何か別のスマートな考え方があってもよい気がしていて、ここは実装の詳細としても良いのかなと感じました。ここで、たとえば「CJK 固有の書記体系のグリフ (ひらがな、カタカナ、漢字、ハングル、注音符号...) については、デフォルトでは kern を無効に」とすれば、いろんな実装を許容しつつも、本来の要求は説明できるのでは? と考えた次第でした。ただ「全角欧文は Latin として分類していいか」などのケースは無視してしまっていますので、破綻がないかは、自信はありません... 画像 和字が等幅ではない日本語フォント ​ の判定は、ヒューリスティクスに頼るしかないと思っています。が、最後に付け加えたケースについては非常に稀に思えるので「考慮せず kern は無効」でも問題ない、むしろ考慮しないほうが挙動に一貫性があるのでは、と個人的には思いました。

Yasuo Kida

なるほど。確かに。日本語スクリプトに対して常に kern は off で一貫性を取るのは良い方法かもしれませんね。漢字に関して中国と韓国は同意してくれるかな?

takaakifuji commented 11 months ago

I post my own note on what I learned from the recent chat in Teams, hoping that helps the further discussion.

Ways to determine CJK glyphs exempted from applying kern by default

As Nat-san describes, the applications are now expected to ignore kernings for monospaced/full-width CJK glyphs. But how?

  1. Table driven
    • Each app brings its own hard-coded table (sort of EastAsianWidth.txt) that helps to decide which characters are full-width
    • If such full-width glyphs are involved with kern/palt, disable it by default
  2. Font metrics driven
    • Each app calculates the standard Han advance width (using exemplar glyph like U+6C34) from the font
    • If the advance width of the involved glyphs matches the standard Han advance width, disable kern/palt by default
  3. Script driven
    • Each app itemise the input text as multiple runs by script based on UCD property
    • If a run belongs to CJK script such as Hiragana, Katakana, Han, Bopomofo etc, disable kern/palt at every character in the run by default
  4. GSUB heuristics
    • Each app analyzes which glyphs have proportinal alternate metrics via 'palt' feature in the font
    • If the involved glyphs have corresponding 'palt' entries, disable kern by default

1. is most proven to work at this point and I assume this is the one Nat-san's original proposal is based on.

2. is my understanding of what Kida-san says in the recent discussion. It leaves some difficulties on how to reliably determine the actual width of the full-width characters in a font considering the case like AXIS Font, and there are cases that non-CJK (e.g. Latin) glyphs are treated full-width as false positives if implemented naively. However, it may work if you could make sure all glyphs in a script are fixed-width combining with 3. discussed below.

3. is my understanding of what Firefox does to satisfy the expectation. Script itemization looks tricky as it's hard to make the kerning right between punctuation glyphs classified as Common, and the implementation may vary among the applications as we have no standards. In theory full-width Latin and Cyrillic/Greek letters are not treated as CJK thus kerned by default. Murakami-san reports Firefox works as he expects. I have only looked through a small portion of the unfamiliar codebase, so I could be wrong though.

We found out that 4. doesn't work because all CJK glyphs are not expected to have 'palt' entries, and that ends up applying 'kern' to random glyphs when 'palt' are absent, which is not what we want. However, it can be used as a hint to tell a set of glyphs in a script are intended as fixed-width when you combine with 3..

In any approach, to mitigate the impact to the non-CJK scripts, we might be able to skip this 'kerning exemption' behavior, for instance, when the font contains 'meta' table with 'dlng' tag, or the end-user explicitly declares the language that he/she wants to set with, like <p lang="ja"> in HTML.

For the proposal, the monospace/proportional distinction in the proposal resembles post.isFixedPitch and confusing, as Hattori-san says. Kida-san currently suggests that replacing monospaced with full-width should be good, while Taro-san prefers the word fixed-width.

My starting point was the wording monospaced implies 1. or 2., but I found there's an app that implements 3. to achieve the same goal without introducing the idea of CJK widths. The spec should leave some room for variations while concisely describing our expectations, so I think discussing alternative approach other than 1. is valuable to improve the proposal.

kidayasuo commented 11 months ago

@takaakifuji thank you for the good summary.

Just one point. I guess 1 & 3 are in the same category in that they are based on character code and text is segmented based on properties of the character. with the only difference being what properties they use. Is my understanding correct?

macnmm commented 11 months ago

Without special info added to the fonts, I don't think it is possible for the app to calculate with 100% accuracy whether or not a given font (or subset of the font's glyphs) is designed to be monospaced or proportional, or for that matter, whether the designer intended them to be kerned by default or as an option. Using Unicode ranges for a definition of "non-CJK" (and therefore presume those ranges to be kerned by default) is easier, but this will be app-specific and therefore not consistent or 100% accurate -- this would be method 1, above. I only mention the phrase "mono-spaced CJK" to allow for there to be proportional CJK fonts (that kern by default) as an edge-case, and to make clear how one determines the default behavior correctly. Incidentally, we use the definition "is this glyph upright in vertical text?" as a way to avoid the non-CJK but nonetheless full-width glyphs being proportional and kerned incorrectly as mentioned can happen with the method 3.

kidayasuo commented 11 months ago

Using UAX 50 Unicode Vertical Text Layout might be a great option. Characters that need to be rotated on vertical text, such as Latin characters, have strong horizontal binding between the characters and therefore kerned. Whereas ones that can stay upright have weaker tie between the characters on the left and right and have more individual freedom.

Is this something InDesign use? @macnmm

takaakifuji commented 11 months ago

@kidayasuo According to my understanding, it's correct in a sense that both rely on a property of each character. And I might have just wanted to exaggerate the nature of 3., which only uses a more generic property and is less coupled with the idea of CJK widths.

BTW, I had a look on the terminology in some other registered features in the current spec:

They might be used to mean something subtly different in each context, but we are seeing a range of expressions. As the current 'kern'/'vkrn' definition already forbids the usage with 'fwid'/'hwid' glyphs in the Feature Interaction section, I hope we can find a good way to make the whole mechanics clearer. Plus, 'kern' has long been known as one of the on-by-default-regardless features for both app and font developers, so the line 'may be activated' sounds a bit sudden to me from a non-CJK standpoint.

Thank you for the insight @macnmm! I'm sorry if I am just confusing the discussion.

macnmm commented 11 months ago

@takaakifuji I agree with you in making the whole mechanism clearer. The CJK world being fraught with contradictions and special-cases and interdependencies is what we are trying to surface and make clear. Kern being on by default is not true for CJK, yet few know this until they get user complaints or bugs (or just look bad). Some browsers kern CJK apparently incorrectly; Photoshop just turns all kerning off in CJK, not offering our "和文等幅" (Metrics/Auto for Roman, off for CJK) default found in more text-heavy apps. In my document, I attempted to describe the need for something like 和文等幅 without specifying it specifically. I tried to describe the specific type of CJK glyphs that should not be kerned by default, so engine devs could set a reasonable default with better nuance. I think for foundries, they already know that using 'palt' saves them from making a lot of nearly identical 'kern' pairs. The issue was that browsers and other engine devs were unaware of CJK and the need for a different nuanced default value.

takaakifuji commented 10 months ago

@macnmm

Thank you so much for clarifying the intention!

As an user, how I understand what InDesign's 和文等幅 achieves in short is 和文はベタ、欧文はプロポーショナル in Japanese. The phrase leaves many technical ambiguities but I think the intention is clear. And it's not about the widths but the writing script. So, sort of the descriptions like

Script/language sensitivity: Shall be conditionally applied to CJK glyph runs to achieve the '和文はベタ、欧文はプロポーショナル' setting by default, and become fully script-insensitive once 'palt' gets activated.

thus

Script/language sensitivity: Shall be conditionally applied to CJK glyph runs so that they will be set solid (placed on a grid) by default, and become fully script-insensitive once 'palt' gets activated.

should at least tell devs that they need a different nuanced default value. Might be a setback or too unspecific as it fails to mention the edge cases like the proportional-by-default CJK fonts, but my original concern was the idea of CJK widths are too hard to explain in the limited length of text, and may become confusing to font developers as well.

What do you think of the idea of avoiding the term mono-spaced CJK in the first place?

macnmm commented 10 months ago

So, my intention with the phrasing "monospaced (e.g. CJK script) runs" is meant to indicate kerning must happen by default on proportional glyphs only, no matter what script they are. In other words, default kerning is about the proportionality/monospaced-ness, not about the script. But, since CJK script is normally monospaced, by extension it should not be kerned by default.

I think eliminating the "monospaced" wording will focus on font script and CJK only, when my point is valid for monospaced Latin fonts (or any script fonts) as well as CJK.

kidayasuo commented 10 months ago

Thank you @takaakifuji and @macnmm for the great discussion regarding Japanese fonts having both mono-spaced and proportional glyphs and thus needing different treatments within a style run. I agree and believe this is one of the critical parts of the proposed text, especially considering the Photoshop case @macnmm mentioned.

As this bug itself is about clarifying the definition of “mono-spaced” CJK, I created a separate issue tracking improvements on the text explaining the handling of mono-proportional mixed fonts, which constitute a majority of Japanese typefaces. Both are closely related but I believe the discussion becomes clearer by separating the two.

kidayasuo commented 10 months ago

Let’s get back to the original issue of how we define “mono-spaced CJK”.

By separating the issue #2 and #5, I am implying that I believe it makes things simpler by separating the process like below:

  1. Tell if the character is “wide” in the nature. This can be done by 1. or 3. in @takaakifuji’s comment.

  2. Tell if the character identified by the step 1 is actually proportional. This happens when palt is applied or when the font is truly proportional.

kidayasuo commented 10 months ago

Tell if the character is “wide” in the nature. This can be done by 1. or 3. in @takaakifuji’s comment.

What are pros/cons of the solution 1 & 3?

macnmm commented 5 months ago

In the AHG meeting of 2024-03-19 it was decided to move forward with revised documentation of palt and kern, and to keep mention of "monospaced" glyphs and the defaults issue (kern not always on by default, despite its original intent). Adding the UAX11 suggestion should clarify what is meant by "monospaced Japanese". In addition, a new apkn feature was suggested as a much cleaner solution to default CJK kerning/proportionality, which we will further explore with font foundries.

PeterCon commented 5 months ago

I'm trying to wrap up OpenType 1.9.1 soon (it's been in alpha stage for a long time now). If folk here are agreeable, I'd like to incorporate changes for this into OT 1.9.1, and it would be helpful to discuss further in the corresponding microsoftdocs issue that Nat opened: MicrosoftDocs/typography-issues#1069.

kidayasuo commented 5 months ago

Thank you @PeterCon for offering your assistance. Having the support of an expert like you is truly reassuring.

Yes, if others do not object I am fine that you incorporate it into OT 1.9.1. thank you!