U+0640 (AKA kashida AKA tatweel) mark positioning is incorrectly removed from`mark` feature to `abvm`/`blwm` features

googlefonts / ufo2ft

A bridge from UFOs to FontTools objects (and therefore, OTFs and TTFs).

MIT License

152 stars 43 forks source link

U+0640 (AKA kashida AKA tatweel) mark positioning is incorrectly removed from`mark` feature to `abvm`/`blwm` features #579

Closed khaledhosny closed 1 year ago

khaledhosny commented 2 years ago

With this PR, my kashida mark to base is moved from mark feature to abvm/blwm and they don’t get applied for Arabic except in HarfBuzz.

Originally posted by @khaledhosny in https://github.com/googlefonts/ufo2ft/issues/567#issuecomment-1014679531

khaledhosny commented 2 years ago

I think this is because unicodedata.script_extension() returns {'Phlp', 'Arab', 'Mand', 'Ougr', 'Syrc', 'Mani', 'Sogd', 'Adlm', 'Rohg'} for 0x0640 and several of these are USE scripts, but:

For characters that can b e used in multiple scripts, it should be added to all relevant features for these scripts, not only one of them.
If the font does not have any characters of a given script other than shared ones, it should not be considered (alternatively font’s languagesystems should to be consulted).

khaledhosny commented 2 years ago

Honest question, does the abvm/blwm and mark split make any difference for USE at all? The spec says all positioning features all applied simultaneously, and it applies these three features. The distinction might make difference for Indic shapers that does not apply mark, but I don’t see how this applies to USE.

anthrotype commented 2 years ago

I don't know, maybe @behdad or @simoncozens do?

simoncozens commented 2 years ago

The problem is that treating Indic scripts and USE scripts differently leads to bugs like #566. That's why we need to find a way to treat them the same.

khaledhosny commented 2 years ago

Now treating non-Indic and USE scripts differently is causing bugs like this one. What about some radical idea; write one mark positioning lookup and add it to the three features indiscriminately?

anthrotype commented 2 years ago

wouldn't that be applied twice?

simoncozens commented 2 years ago

Yes, that would fail if the user adds a manual adjustment rule to the end of, say, abvm after the first mark attachment takes place, and then mark comes along and attaches it back to the base again.

khaledhosny commented 2 years ago

If it is the same lookup referenced from different features it should be applied only once. But mark positioning in not cumulative, so even if a broken implementation applied the lookup three times it would still give the same output.

khaledhosny commented 2 years ago

The manual lookup will come last, so it does not matter what feature it was added to.

anthrotype commented 2 years ago

mark positioning in not cumulative, so even if a broken implementation applied the lookup three times it would still give the same output

good point. Besides, building one lookup for all mark-related features would simplify the code a bit. I'm warming up to the idea.

simoncozens commented 2 years ago

I think Khaled's two points:

For characters that can b e used in multiple scripts, it should be added to all relevant features for these scripts, not only one of them.

If the font does not have any characters of a given script other than shared ones, it should not be considered (alternatively font’s languagesystems should to be consulted).

are the ones to address.

khaledhosny commented 2 years ago

Aren't all GPOS lookups applied in one stage? If so the feature tag does not really matter, only the lookup order. A lookup defined in a prefix can't override automatic lookups inserted after it already regardless of what we are going to do here.

khaledhosny commented 2 years ago

This is breaking several fonts for me already, and I don’t want to start forking feature writers! One less far reaching fix, is to limit USE_SCRIPTS to scripts that we know kern, mark, and mkmk features are not applied for them.

khaledhosny commented 2 years ago

(This affects more than kashida, it affects any code point with script extension that includes on of the USE scripts and there is quite a few of them).

behdad commented 2 years ago

Can we unblock Khaled's fonts please?

simoncozens commented 2 years ago

One less far reaching fix, is to limit USE_SCRIPTS to scripts that we know kern, mark, and mkmk features are not applied for them.

I don't understand this. kern/mark/mkmk are applied for all USE scripts, right?

simoncozens commented 2 years ago

Daft question, can we just use mark for everything? The abvm/blwm split is error-prone anyway. (https://github.com/fonttools/fonttools/issues/2510#issuecomment-1018380320)

madig commented 1 year ago

Daft question, can we just use mark for everything?

Isn't the problem that some non-HarfBuzz shapers demand abvm/blwm? HB seems fine with putting stuff into mark.

khaledhosny commented 1 year ago

I think HarfBuzz enables all the three features everywhere, but I don’t think other implementations do the same. I know that Core Text does not enable abvm/blwm for Arabic. Other scripts might get abvm/blwm but not mark.