Open Manishearth opened 7 years ago
Screenshot on my system, with buggy fonts marked highlighted red:
Creating these kinds of ligatures, specially RIAL and ALLAH are very common in fonts.
The bug here seams to be the font assigning U+FDF2 to a ligature glyph for the second joining segment of the word ALLAH (which is LLAH), instead of creating a composed glyph for U+FDF2 using the ligature.
CLDR data, which is our primary source for character support, misses any kind of information about ligatures (and their possible codepoints). Seeing this bug being common, specially in the more open-source fonts, I think we can cover the topic in ALReq and, even, maybe, provide an Annex with some details about the important ligatures and their implementation details in fonts (like the detail here that the ligature doesn't get U+FDF2 codepoint, but U+FDF2 uses the ligature.)
What do you think?
Since U+FDF2 is a presentation form character, I think we shouldn’t say much more than discouraging the use of presentation forms in text input. As for the fonts, though they indeed break the glyph for U+FDF2, the ligatures for الله
and لله
still work correctly.
Right, @khaledhosny. True that we want to discourage them in text. So, the question is, do we want to cover the issue for the sake of improving font development processes and font products for the script?
Since the topic is not exactly text layout, I think it could be a separate (wiki) document, or maybe an annex on font development.
I agree this does not belong to the main document, an annex on Arabic font development best practices might be a good idea.
My thinking is :
Html code to test your fonts:
<p> ﷲ اللَّه الله </p>
@behnam and @khaled, +1 to cover font development best practices.
The Unicode Standard 11.0.0 says the following in section 9.2 Arabic Presentation Forms-A: U+FB50–U+FDFF, Word Ligatures (this was added in Unicode 7.0.0):
U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM is a very common ligature, used to diplay the name of God. When the formation of the allah ligature is desired, the recommended way to represent the word would be <alef, lam, lam, shadda, superscript alef, heh> <0627, 0644, 0644, 0651, 0670, 0647>. In non-Arabic languages, other forms of heh, such as heh goal (U+06C1), may also form the ligature. Extra care should be taken not to form the ligature in the absence of the shadda and the superscript alef, as the sequence <alef, lam, lam, heh> and <alef, lam, lam, shadda, heh> exist in Persian and other languages with different meanings or pronunciations, where the formation of the ligature would be incorrect and inappropirate.
I decided it was time for me to explore this a little more deeply. Here are some other results. I created a test page at: https://w3c.github.io/alreq/gap-analysis/tests/ligation/ligation_000.html
Here are some results i screen-captured on my Mac. Grey backgrounds from a v quick scan indicate things i think are probably incorrect.
Essentially, this whole thing is quite broken, it seems. (Which is surprising given the content involved.)
Arial overcompensating by adding a double shadda/alif is very surprising (and somewhat hilarious) to me given how commonly that font is used.
Then again, I guess very little about non-latin text not working on computers should surprise me anymore 😩
My perception is that, contrary to what Unicode suggests, Arabic users expect bare [alef] lam lam heh to ligate and that is what almost all Arabic fonts do. Arabic non-God name words that would match the same sequence of letters are very uncommon to the extent that I never encountered any of them until I was researching this very issue. In Amiri I approached this from the other end; actively matching sequences that are unlikely to be the name of God and unligating them, e.g. خالله does not ligate, but فالله ligates while فالَله does not.
When I discussed this issue with @roozbehp he had some examples of Persian words that do this, IIRC.
Just to lay it out, there are multiple issues here, of varying severity:
As @r12a notes in https://r12a.github.io/scripts/arabic/block#charFDF2 the compatibility decomposition for FDF2 is <alif, lam, lam, heh> (“≈ [isolated] 0627 0644 0644 0647”).
While the (non normative) reference glyph is a ligature <alif, lam, lam, shadda, superscript alif, heh>, this hasn’t always been the case. In the Appendix H. New Characters of the Unicode Standard 1.1, the reference glyph used is a ligature <alif, lam, lam, heh> without shadda nor superscript alif. This may explain where the compatibility decomposition of FDF2 comes from.
The production process changed between Unicode 2.x and 3.0. From that point on, different custom software was used with an entirely new collection of TrueType fonts. With many upgrades, both to the software and the font collection, that process is still very much in place today.
Every update of the font collection bears the risk of unintentional changes, and not all of them are caught be reviewers. Therefore, it would take some digging to find out whether the change from a glyph matching the decomposition to a glyph adding shadda and alif was indeed intentional at the time.
I was curious to see if any fonts have FDF2 as alif, lam, lam, heh without shadda and superscript alif.
I managed to find a handful:
There are most probably more.
Including these, there are also more typefaces that do not ligate <lam, lam, heh> (regardless of what FDF2 they have). Some of these do have an optional discretionary ligature feature that does the ligature.
There may also be fonts that do FDF2 with shadda but no s. alif like https://www.linotype.com/1079191/hasan-alquds-unicode-regular-product.html?site=webfonts&format=ot-ttf&branding=std or there may also be fonts that do FDF2 with shadda and fatha like https://fonts.google.com/specimen/Harmattan.
U+FDF2 'ARABIC LIGATURE ALLAH ISOLATED FORM' (ﷲ) is supposed to render as alef-lam-lam-meem (with diacritics), but in some fonts, including Courier New, the Alef is missing.
http://www.fileformat.info/info/unicode/char/fdf2/fontsupport.htm
The code point could conceivably mean "the main l-l-m ligature in 'allah'", however the spec decomposes it as a-l-l-h, so all fonts should render the leading alef.