w3c / alreq

Documenting gaps and requirements for support of Arabic Script languages on the Web and in eBooks.
Other
62 stars 31 forks source link

Characters from Presentation Form blocks legitimate to use in text #128

Open behnam opened 7 years ago

behnam commented 7 years ago

Both having Compatibility Decompositions, and have had special glyphs in many fonts (and movable type, in pre-digital typesetting) for years.

behnam commented 7 years ago

Based on discussion started in https://github.com/w3c/alreq/issues/125

behnam commented 7 years ago

Also, these two:

From Unicode Standatd (Version 9.0.0): http://www.unicode.org/versions/Unicode9.0.0/ch09.pdf

Word Ligatures. The signs and symbols encoded at U+FDF0..U+FDFD are word ligatures sometimes treated as a unit. Most of them are encoded for compatibility with older character sets and are rarely used, except the following: [...]

behnam commented 7 years ago

So, we need to consider addressing these questions in Section 2.1 Encoding.

behnam commented 7 years ago

Also, these: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{subhead=Word%20ligatures}

Arabic Presentation Forms A — Word ligatures items: 12

 ‎ﷰ‎    U+FDF0  ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM
 ‎ﷱ‎    U+FDF1  ARABIC LIGATURE QALA USED AS KORANIC STOP SIGN ISOLATED FORM
 ‎ﷲ‎    U+FDF2  ARABIC LIGATURE ALLAH ISOLATED FORM
 ‎ﷳ‎    U+FDF3  ARABIC LIGATURE AKBAR ISOLATED FORM
 ‎ﷴ‎    U+FDF4  ARABIC LIGATURE MOHAMMAD ISOLATED FORM
 ‎ﷵ‎    U+FDF5  ARABIC LIGATURE SALAM ISOLATED FORM
 ‎ﷶ‎    U+FDF6  ARABIC LIGATURE RASOUL ISOLATED FORM
 ‎ﷷ‎    U+FDF7  ARABIC LIGATURE ALAYHE ISOLATED FORM
 ‎ﷸ‎    U+FDF8  ARABIC LIGATURE WASALLAM ISOLATED FORM
 ‎ﷹ‎    U+FDF9  ARABIC LIGATURE SALLA ISOLATED FORM
 ‎ﷺ‎    U+FDFA  ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
 ‎ﷻ‎    U+FDFB  ARABIC LIGATURE JALLAJALALOUHOU
khaledhosny commented 7 years ago

I think we can added something similar to the paragraph from Unicode Standard above about the word ligatures without going into much details since their use varies. May be also mention the font issues with some of them.

behnam commented 7 years ago

Yeah. These code-points, specially U+FDF2, do appear in text because of a valid reason: wanting fallback to next font with the "ligature" (sign, I would call them myself), instead of falling back to text representation.

And, when doing that, I think we actually can enlist the codepoints (the ranges) of the actual ligature characters, which we don't expect in text at all, and no font is necessitated to support them. This would make it clear what's out of the loop more easily for everyone: this spec and font developers.

And, "Currency" is a special case to talk about in general, which can cover the case for RIAL (and AFGHANI).

shervinafshar commented 6 years ago

At the moment we are consuming CLDR through the script to generate the table. We might need to add a local data file as override/addition to CLDR data. The script then uses both of these data points to generate the table. This way we'd have more control on the table content.

behnam commented 6 years ago

More codepoints that need coverage:

U+FD3E ORNATE LEFT PARENTHESIS
U+FD3F ORNATE RIGHT PARENTHESIS
U+FDFC RIAL SIGN
U+FDFD ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
shervinafshar commented 6 years ago

107 and this one is along the same lines. Local override to CLDR might be the solution to both.