w3c / alreq

Documenting gaps and requirements for support of Arabic Script languages on the Web and in eBooks.
Other
62 stars 31 forks source link

Characters that are not used for Arabic/Persian #277

Open xfq opened 6 months ago

xfq commented 6 months ago

https://www.w3.org/TR/alreq/#h_character_tables_punctuation_and_symbols

There are some characters that are not used for Arabic (like U+0020 SPACE and U+002A ASTERISK), and some characters that are not used for Persian (like U+0022 QUOTATION MARK). I wonder what the criteria are for selecting these characters?

shervinafshar commented 5 months ago

For Persian, there is a standard—ISIRI-9147, pp. 17-19 of PDF—available. For Arabic, we couldn't surface such document and if I recall it correctly, we relied on CLDR data and the case of U+0020 for Arabic seems to be an error. We probably need to revisit this section for Arabic.

Also, if you were unaware, we provisionally recorded our non-normative references in a spreadsheet here with the objective of migration the content eventually to the document. I added #278.

avidseeker commented 3 months ago

The following tables list Unicode characters used for Arabic script.

What does this mean? Is it that these characters are available in Arabic keyboard layouts? Or that they're commonly used in online Arabic texts? and what sort of Arabic (Classical Arabic or Modern Standard Arabic)? This needs to be clarified.

Until such standard for Arabic is published, it is safe to dismiss U+0020 SPACE and U+002A ASTERISK as not being used. It is easy to verify that those characters are used in numerous Arabic books and online webpages.

As for why Persian doesn't use U+0022 QUOTATION MARK, this is because quotation marks differ depending on the locale. See this Wikipedia table. They also need not to be used in Arabic text, but since they're in the default layout, they are used as semantic quotes rather than typographical quotes (e.g: see https://github.com/jgm/pandoc/issues/10013).