Open xfq opened 6 months ago
For Persian, there is a standard—ISIRI-9147, pp. 17-19 of PDF—available. For Arabic, we couldn't surface such document and if I recall it correctly, we relied on CLDR data and the case of U+0020
for Arabic seems to be an error. We probably need to revisit this section for Arabic.
Also, if you were unaware, we provisionally recorded our non-normative references in a spreadsheet here with the objective of migration the content eventually to the document. I added #278.
The following tables list Unicode characters used for Arabic script.
What does this mean? Is it that these characters are available in Arabic keyboard layouts? Or that they're commonly used in online Arabic texts? and what sort of Arabic (Classical Arabic or Modern Standard Arabic)? This needs to be clarified.
U+0671 ARABIC LETTER ALEF WASLA
is used in Quran and Classical Arabic manuscripts, but not in MSA. Same goes for U+0653-U+670Until such standard for Arabic is published, it is safe to dismiss U+0020 SPACE
and U+002A ASTERISK
as not being used. It is easy to verify that those characters are used in numerous Arabic books and online webpages.
As for why Persian doesn't use U+0022 QUOTATION MARK
, this is because quotation marks differ depending on the locale. See this Wikipedia table. They also need not to be used in Arabic text, but since they're in the default layout, they are used as semantic quotes rather than typographical quotes (e.g: see https://github.com/jgm/pandoc/issues/10013).
https://www.w3.org/TR/alreq/#h_character_tables_punctuation_and_symbols
There are some characters that are not used for Arabic (like
U+0020 SPACE
andU+002A ASTERISK
), and some characters that are not used for Persian (likeU+0022 QUOTATION MARK
). I wonder what the criteria are for selecting these characters?