Why are bidi categories of Arabic-indic & Eastern Arabic numbers different?

w3c / alreq

Documenting gaps and requirements for support of Arabic and Persian on the Web and in eBooks.

Other

62 stars 31 forks source link

Why are bidi categories of Arabic-indic & Eastern Arabic numbers different? #85

Open ntounsi opened 7 years ago

ntounsi commented 7 years ago

Are all numbers equal in category and directional property?

Digit 2 (U+0032) is of category "EN, European Number". OK.
Arabic-digit indic ٢ (U+0662) is of category "AN, Arabic number". OK.
but the other ۲ (U+06F2), the Eastern Arabic-Indic counterpart of it, is of category "EN, European Number" like digit 2. Any reason to this difference between the last two?

There is also a difference in Bidi behavior : the same visual text a2b will be displayed in RTL context as b2a if two is Arabic number, and a2b , if European number (simply like "a 2 b"). Aren't ALL numbers WEAK in directional property?

duerst commented 7 years ago

As far as I remember, the difference in bidi category between Arabic-Indic digits and Eastern Arabic-Indic digits is due to the difference in bidi behavior desired in Arabic vs. Persian. Details should be available from Unicode.

ntounsi commented 7 years ago

http://unicode.org/reports/tr9/#AN Section : 3.2 Bidirectional Character Types "[...]

As of Unicode 4.0, the Bidirectional Character Types of a few Indic characters were altered so that the Bidirectional Algorithm preserves canonical equivalence. That is, two canonically equivalent strings will result in equivalent ordering after applying the algorithm."

I guess the "few Indic characters" are the Eastern Arabic-Indic digits in range U+06F0..U+06F9, which are classified "European Number" vs "Arabic numbers". I wonder what is the "canonical equivalence" problem in question. Didn't find more details.

khaledhosny commented 7 years ago

I think it is referring to characters used for Indic languages, not the Arabic-Indic digits which AFAIK had this distinction from the start.

ebraminio commented 7 years ago

@shervinafshar and I had a discussion about this years ago here: https://groups.google.com/forum/#!topic/persian-computing/602gqTIrlPQ because I found Arabic-Indic Extended to suit better for our use on a special case (but maybe is better on other cases).

I remember @roozbehp (which I guess won't get pinged by my mentioning here), somewhere on a very old mailing list discussion, something like 2001(?), wrote he was explaining to a developer why these are different, so if my memory on this is correct, perhaps he would be a good person to ask about the reason of the difference.