Open behnam opened 7 years ago
Based on discussion started in https://github.com/w3c/alreq/issues/125
Also, these two:
FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
FDFD ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
From Unicode Standatd (Version 9.0.0): http://www.unicode.org/versions/Unicode9.0.0/ch09.pdf
Word Ligatures. The signs and symbols encoded at U+FDF0..U+FDFD are word ligatures sometimes treated as a unit. Most of them are encoded for compatibility with older character sets and are rarely used, except the following: [...]
So, we need to consider addressing these questions in Section 2.1 Encoding.
Also, these: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{subhead=Word%20ligatures}
Arabic Presentation Forms A — Word ligatures items: 12
ﷰ U+FDF0 ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM
ﷱ U+FDF1 ARABIC LIGATURE QALA USED AS KORANIC STOP SIGN ISOLATED FORM
ﷲ U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM
ﷳ U+FDF3 ARABIC LIGATURE AKBAR ISOLATED FORM
ﷴ U+FDF4 ARABIC LIGATURE MOHAMMAD ISOLATED FORM
ﷵ U+FDF5 ARABIC LIGATURE SALAM ISOLATED FORM
ﷶ U+FDF6 ARABIC LIGATURE RASOUL ISOLATED FORM
ﷷ U+FDF7 ARABIC LIGATURE ALAYHE ISOLATED FORM
ﷸ U+FDF8 ARABIC LIGATURE WASALLAM ISOLATED FORM
ﷹ U+FDF9 ARABIC LIGATURE SALLA ISOLATED FORM
ﷺ U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
ﷻ U+FDFB ARABIC LIGATURE JALLAJALALOUHOU
I think we can added something similar to the paragraph from Unicode Standard above about the word ligatures without going into much details since their use varies. May be also mention the font issues with some of them.
Yeah. These code-points, specially U+FDF2, do appear in text because of a valid reason: wanting fallback to next font with the "ligature" (sign, I would call them myself), instead of falling back to text representation.
And, when doing that, I think we actually can enlist the codepoints (the ranges) of the actual ligature characters, which we don't expect in text at all, and no font is necessitated to support them. This would make it clear what's out of the loop more easily for everyone: this spec and font developers.
And, "Currency" is a special case to talk about in general, which can cover the case for RIAL (and AFGHANI).
At the moment we are consuming CLDR through the script to generate the table. We might need to add a local data file as override/addition to CLDR data. The script then uses both of these data points to generate the table. This way we'd have more control on the table content.
More codepoints that need coverage:
U+FD3E ORNATE LEFT PARENTHESIS
U+FD3F ORNATE RIGHT PARENTHESIS
U+FDFC RIAL SIGN
U+FDFD ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
U+FDFC RIAL SIGN https://r12a.github.io/uniview/?char=FDFC General category: Sc - Symbol, currency
U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM https://r12a.github.io/uniview/?char=FDF2 General category: Lo - Letter, other
Both having Compatibility Decompositions, and have had special glyphs in many fonts (and movable type, in pre-digital typesetting) for years.