IBM Plex Sans Arabic issue with Tashkeel

AMKamel commented 4 months ago

Specify the font name in title with a short description of the bug. Please report any issue related to Noto fonts here. Report any issue with Google Icon/Symbols here

Describe the bug When you type the word اللَّهُ it get's presented in a wrong way like this

To Reproduce Type the word اللَّهُ Expected behavior It should appear as it is appearing above

Screenshots Added Additional context N/A

emmamarichal commented 4 months ago

@yanone could you take a look?

yanone commented 4 months ago

This is a problem that exists with many Arabic fonts, even my own (work-in-progress).

These fonts have a ligature for الله which already includes tashkeel in its presentation which are not typed, so a user just types ا ل ل ه and it turns into the ligature including tashkeel. This الله that I typed here should probably also contain tashkeel.

That's of course problematic. In the issue above, the typed text itself contains tashkeel, which are then applied on top of the ligature that already contains tashkeel.

The easiest solution would be to remove the composed ligature from the font, requiring users to explicitly apply tashkeel. But this has not been the common practice in font-making. The common practice until now has been to type plain ا ل ل ه and receive a ligated الله incl. tashkeel.

Another solution is to correctly apply OpenType feature code.

In my own font (just now for testing) I've been able to eliminate the tashkeel collisions by removing the IgnoreMarks modifier for the allah-ar character, so the code looks like this:

lookup rlig_arab_1 {
  # Arabic
  script arab;
    # Default
    language dflt;
    lookupflag RightToLeft;
    sub alef-ar lam-ar.init lam-ar.medi heh-ar.fina by allah-ar;
} rlig_arab_1;

Which means that, as soon as marks are involved, the ligature will not be substitued. The results speaks for itself:

Bildschirmfoto 2024-05-08 um 14 08 08

Before we move on to fix the issue in any of the fonts, I would like to proceed with defining a Fontbakery check for this.

I would make this check into a shaping check, counting the amount of glyphs after shaping. For example: If the sequence ا ل ل ه turns into a single glyph, then also measure what ا ل ل ه ُ turns into. If it's two glyphs (ligature + tashkeel), the OpenType code ignores marks which should turn into a FAIL.

@khaledhosny @simoncozens, what are your thoughts on this?

khaledhosny commented 4 months ago

See https://github.com/IBM/plex/issues/407

simoncozens commented 4 months ago

My thoughts are (a) it's a good candidate for a shaperglot check, and (b) https://www.unicode.org/notes/tn46/tn46-1.pdf

yanone commented 4 months ago

@simoncozens Is such a check implementable using the current set of shaperglot instructions? If so, how would you implement it?

Khaled's idea in the linked thread of offering two ligatures is valid (tho Bold Monday's implementation of putting all marks on the second ل is surely wrong), as is offering just one ligature and ignoring it as soon as marks are present.

At least I don't see how this can be solved using a static shaperglot test definition. Because you would have to compare the output buffers of ا+ل+ل+ه against ا+ل+ل+ه+ُ (for example), but check that just the base ا+ل+ل+ه differs. You need to compare two sequences that don't have the same input string, and so they are going to be different in any case.

I think this would be a dynamic check written in code. Then of course it makes no difference whether it's in FB or shaperglot, with the latter being the better host.

google / fonts

IBM Plex Sans Arabic issue with Tashkeel #7611