Open Huji opened 6 years ago
From https://github.com/w3c/csswg-drafts/issues/2753 which includes a useful test on jsfiddle
@behnam @shervinafshar @khaledhosny any comment on this?
Also note that the example on jsfiddle is not fixed in length. I have two questions about that:
I also note that the implementation from @Huji doesn't use a ZWJ after HEH. I checked various fonts and found that in isolation some will produce a 'round' HEH but others will produce one that looks like a joining HEH. There's also a systematic difference between the shapes with and without a ZWJ. Which is best?
The correct form is not to use ZWJ, but rather to use the round form of heh. So of the options you show graphically, the one on the right is preferred. I believe that is also the one I used in the jsFiddle.
As for it not being fixed length: I have never seen a use case where more than 32 footnotes were mentioned alphabetically, therefore I have never seen something like الفالف or بب. I just updated the jsFiddle to use a fixed system: https://jsfiddle.net/a8obup7r/12/
I strongly suggest collecting samples for this before making any specification decisions. My usual source for these matters is Adib-soltani and here is what I see there:
الف
. it's always الف
;الف)
than الف.
;Also, couple of other observations:
Abjad might end up tricky; some sources (much less credible than Adib-soltani) mention that 11, 12, and 13 should be یا
and یب
and یج
. I'm still researching this.
In "خلاصة السّیاق" (Seyed Hasan Ghajar Tafreshi, 1326 AH, Tehran), a tabulation of the abjads and their values is presented. Note presented forms for م.
The correct form is not to use ZWJ, but rather to use the round form of heh. So of the options you show graphically, the one on the right is preferred. I believe that is also the one I used in the jsFiddle.
@Huji I paid a little more attention and realised that the jsfiddle is using ھ [U+06BE ARABIC LETTER HEH DOACHASHMEE] for HEH, rather than ه [U+0647 ARABIC LETTER HEH]. I don't think the former is correct for Persian (it's used in Central Kurdish, Kashmiri, Luri, Western Panjabi, Sindhi, Saraiki, Urdu, and Uyghur, but not Persian as far as i'm aware).
This would then make my question about shaping of HEH moot. (I must admit i was surprised about the shaping - i should have looked closer.)
@r12a I think you are not correct. Heh Doachashmee is the form of Heh I have seen used when the letter is presented in isolation (both in Abjad and in non-Abjad usage of the letter) in Persian books. I will try to find an example in the (few) books I have at hand.
@behnam @shervinafshar any comment on use of heh doachashmee?
Unicode is full of codepoints with glyphs that can look similar to what we want, or may have a name that sounds like what we want, but neither of those are accurate parameters in deciding which codepoints shall be used for which purposes.
According to ISIRI 6219 (http://persian-computing.org/references/ISIRI/ISIRI-6219.html, in Persian), the only Unicode codepoint to be used in Persian text for the Persian letter Heh is U+0647
. (Also pay attention to the HEH+ZWJ representation of the letter on Table 5 in the standard.)
The ISIRI 6219 specification is based on Unicode recommendations and conversion tables from other Persian encodings/character sets to/from Unicode.
That said, any claims regarding use of Heh Doachashmee
being preferred in Persian needs more evidence, as forms of specifications or data.
As a non-expert, I am unable to produce such evidence or data.
All I can say is to point out that in Table 5 of ISIRI 6219, the letter "Heh" is not shown in its separate form; whether it is shown as HEH+ZWJ (as Behnam says) is a speculation (the document itself does not provide evidence that it is HEH+ZWJ either), and that its footnote number 1 (right after the table) states that when there are multiple ways to produce the same glyph, it is preferred to use the form that uses only a single Unicode character (so if both HEH+ZWJ and HEH DOCHASHMEE are options, the latter is preferred).
With that said, I am okay with either choice, i.e. if we keep it as HEH+ZWJ it is totally fine too, as far as I am concerned.
I might as well add, at this point, that Unicode's CLDR doesn't list HEH DOACHASHMEE as a character used in Persian, either (see https://www.unicode.org/cldr/charts/latest/summary/fa.html)
Here's an attempt to summarise where i think we are with this thread:
We're still awaiting further information related to points 1, 4, and 5 before making changes to the doc.
1: is no go, as far as I can assess. 4: I think I missed the resources elaborating on this possibility. May I ask for the link again, @r12a? 5: is correct per resources at hand. Is there a suggestion to do more research?
4: I think I missed the resources elaborating on this possibility. May I ask for the link again, @r12a?
hi @shervinafshar, this comes from your comment above https://github.com/w3c/predefined-counter-styles/issues/23#issuecomment-396855220
Abjad might end up tricky; some sources (much less credible than Adib-soltani) mention that 11, 12, and 13 should be یا and یب and یج. I'm still researching this.
Thanks. Sorry to miss that. I'll do some research and get back in a week or so.
I checked two implementations of abjad numbered lists (Polyglossia and XePersian) and both confirm my suspicion that 11 and above should be constructed:
11 یا 20 ک 30 ل 31 لا 52 نب ...
I generated two PDFs (Polyglossa, XePersian) with the tabulation from both packages.
It should be noted that these packages have other issues in generating abjad numbered lists which is not in the scope of this issue but it's worthwhile to be pointed out; e.g. XePersian uses آ rather ا in position 1 which ends up in oddities like لآ for 31; Polyglossia uses ي in place of ی and ك in place of ک.
The
persian-alphabetic
counter style recommended in https://www.w3.org/TR/predefined-counter-styles/#arabic-styles does not match what is actually found in Persian literature. The first letter is never used asا
but instead, a spelled-out versionالف
is used. So the first symbol needs to be changed from\627
to\627\644\641