harfbuzz / harfbuzz

HarfBuzz text shaping engine
http://harfbuzz.github.io/
Other
3.66k stars 591 forks source link

Yezidi characters cannot be ligated by ccmp #4667

Closed cheonhyeongsim closed 1 month ago

cheonhyeongsim commented 1 month ago

hebr.zip

Harfbuzz could not ligate the cross-cluster sequences of the Yezidi script. In the original proposal of Yezidi, it mentioned some historical ligatures. Indeed, using hlig would be better for the historical ligatures, but I also want to use them in plain texts, so I use ccmp to ligate the ZWJ-sequence (in order not to ligate the default case). For example, 10EA0 200D 10E86 should be ligated. If I use rclt, the ligature would work well, but if I use ccmp, nothing happens. The font file is uploaded in the attached zip file.

I believe this is no longer an issue related to Chrome but related to Harfbuzz - I tested it both in Crowbar and in hb-shape.

Yezidi is not an Indic script, so why ccmp could not ligate the cross-cluster sequences? The Hebrew script is very similar to the Yezidi script that, they are both RTL non-cursive scripts. The attached font also contains the Hebrew script, and you could see that 05D0 200D 05DC works well - that is also a ZWJ-sequence under ccmp.

By the way, I would like to say that the Khitan Small Script has the same issue. I even thought that Harfbuzz mistakenly treated Khitan Small Script as an indic script.

behdad commented 1 month ago

cc @jfkthame

Yezidi normally goes through the Universal Shaping Engine, which does intra-cluster ccmp I think. If you don't want the Universal Shaping Engine just use the DFLT script tag for it instead.

cheonhyeongsim commented 1 month ago

Yeah DFLT works well (also for Khitan Small Script). But since the very similar Hebrew script can do cross-cluster ligatures by ccmp, will Yezidi be also supported in the future version (of USE or of Harfbuzz)?

behdad commented 1 month ago

Yeah DFLT works well (also for Khitan Small Script). But since the very similar Hebrew script can do cross-cluster ligatures by ccmp, will Yezidi be also supported in the future version (of USE or of Harfbuzz)?

Yezidi is explicitly listed under the USE shaper:

https://learn.microsoft.com/en-us/typography/script-development/use#writing-system-and-language-tags

If indeed it's like Hebrew and does not adhere to the USE model, it should be reported to Microsoft. In the mean time I suggest using DFLT.

dscorbett commented 1 month ago

This is working as intended. The USE model is not just for Indic scripts: it’s supposed to be adequate for any kind of script. You can switch to the default shaper, or use a “standard typographic presentation” feature like 'rclt' or 'rlig'.

cheonhyeongsim commented 1 month ago

Thank you for your explanation. Now I see that some of the scripts should use the USE model for shaping. But I have another question that, why does ccmp only do intra-cluster substitutions? I did not see anything about this limitation at https://learn.microsoft.com/en-us/typography/opentype/spec/features_ae#tag-ccmp.

behdad commented 1 month ago

See https://learn.microsoft.com/en-us/typography/script-development/use#default-glyph-pre-processing-group

cheonhyeongsim commented 1 month ago

I saw that the "Standard Scripts", "Complex Scripts", and some other scripts are separately listed in the left bar of that page. So it indicates that the scripts not listed under the USE shaper use a different shaping rule, right? Scripts like Latin, Greek, Cyrillic, etc., could use ccmp to do the cross-cluster substitutions, but the scripts listed under the USE shaper could not, may I know that this difference is intended for what...? Thank you very much.

devosb commented 1 month ago

Yeah DFLT works well (also for Khitan Small Script). But since the very similar Hebrew script can do cross-cluster ligatures by ccmp, will Yezidi be also supported in the future version (of USE or of Harfbuzz)?

I am curious what application (other than Crowbar and hb-shape) you are using DFLT with. I have used DFLT with a Limbu script font to avoid a bug (now fixed) in the USE. However, with Notepad and Word on Windows (so DirectWrite was the shaper) the Limbu characters were sent to the USE even though the font did not specify the limb shaper. Likewise, DirectWrite sends Grantha characters to the USE, even though the font specified tml2 shaper.

behdad commented 1 month ago

I am curious what application (other than Crowbar and hb-shape) you are using DFLT with. I have used DFLT with a Limbu script font to avoid a bug (now fixed) in the USE. However, with Notepad and Word on Windows (so DirectWrite was the shaper) the Limbu characters were sent to the USE even though the font did not specify the limb shaper. Likewise, DirectWrite sends Grantha characters to the USE, even though the font specified tml2 shaper.

That looks like a DWrite bug to me.

cheonhyeongsim commented 1 month ago

I am curious what application (other than Crowbar and hb-shape) you are using DFLT with.

Google Chrome. Just as I mentioned in #4661 and #4662.

That looks like a DWrite bug to me.

Agree.