w3c / predefined-counter-styles

Predefined Counter Styles
https://w3c.github.io/predefined-counter-styles/
Other
7 stars 14 forks source link

Correction to the persian-alphabetic counter style #23

Open Huji opened 6 years ago

Huji commented 6 years ago

The persian-alphabetic counter style recommended in https://www.w3.org/TR/predefined-counter-styles/#arabic-styles does not match what is actually found in Persian literature. The first letter is never used as ا but instead, a spelled-out version الف‍ is used. So the first symbol needs to be changed from \627 to \627\644\641

svgeesus commented 6 years ago

From https://github.com/w3c/csswg-drafts/issues/2753 which includes a useful test on jsfiddle

r12a commented 6 years ago

@behnam @shervinafshar @khaledhosny any comment on this?

Also note that the example on jsfiddle is not fixed in length. I have two questions about that:

  1. is that correct? (of course)
  2. what happens when you exceed the maximum counter with single characters: do the characters join or not?

I also note that the implementation from @Huji doesn't use a ZWJ after HEH. I checked various fonts and found that in isolation some will produce a 'round' HEH but others will produce one that looks like a joining HEH. There's also a systematic difference between the shapes with and without a ZWJ. Which is best?

screen shot 2018-06-12 at 17 15 11

Huji commented 6 years ago

The correct form is not to use ZWJ, but rather to use the round form of heh. So of the options you show graphically, the one on the right is preferred. I believe that is also the one I used in the jsFiddle.

As for it not being fixed length: I have never seen a use case where more than 32 footnotes were mentioned alphabetically, therefore I have never seen something like الف‌الف or ب‌ب. I just updated the jsFiddle to use a fixed system: https://jsfiddle.net/a8obup7r/12/

shervinafshar commented 6 years ago

I strongly suggest collecting samples for this before making any specification decisions. My usual source for these matters is Adib-soltani and here is what I see there:

  1. it's never الف‍. it's always الف;
  2. dot is not used after the counter letter; rather ⁧‫الف)‬ than ‫الف.‬;
  3. no mention of the behavior for numbers larger than 32

Also, couple of other observations:

Abjad might end up tricky; some sources (much less credible than Adib-soltani) mention that 11, 12, and 13 should be یا‍ and یب and یج. I'm still researching this.

In "خلاصة السّیاق" (Seyed Hasan Ghajar Tafreshi, 1326 AH, Tehran), a tabulation of the abjads and their values is presented. Note presented forms for م. image

r12a commented 6 years ago

The correct form is not to use ZWJ, but rather to use the round form of heh. So of the options you show graphically, the one on the right is preferred. I believe that is also the one I used in the jsFiddle.

@Huji I paid a little more attention and realised that the jsfiddle is using ھ [U+06BE ARABIC LETTER HEH DOACHASHMEE] for HEH, rather than ه [U+0647 ARABIC LETTER HEH]. I don't think the former is correct for Persian (it's used in Central Kurdish, Kashmiri, Luri, Western Panjabi, Sindhi, Saraiki, Urdu, and Uyghur, but not Persian as far as i'm aware).

This would then make my question about shaping of HEH moot. (I must admit i was surprised about the shaping - i should have looked closer.)

Huji commented 6 years ago

@r12a I think you are not correct. Heh Doachashmee is the form of Heh I have seen used when the letter is presented in isolation (both in Abjad and in non-Abjad usage of the letter) in Persian books. I will try to find an example in the (few) books I have at hand.

r12a commented 6 years ago

@behnam @shervinafshar any comment on use of heh doachashmee?

behnam commented 6 years ago

Unicode is full of codepoints with glyphs that can look similar to what we want, or may have a name that sounds like what we want, but neither of those are accurate parameters in deciding which codepoints shall be used for which purposes.

According to ISIRI 6219 (http://persian-computing.org/references/ISIRI/ISIRI-6219.html, in Persian), the only Unicode codepoint to be used in Persian text for the Persian letter Heh is U+0647. (Also pay attention to the HEH+ZWJ representation of the letter on Table 5 in the standard.)

The ISIRI 6219 specification is based on Unicode recommendations and conversion tables from other Persian encodings/character sets to/from Unicode.


That said, any claims regarding use of Heh Doachashmee being preferred in Persian needs more evidence, as forms of specifications or data.

Huji commented 6 years ago

As a non-expert, I am unable to produce such evidence or data.

All I can say is to point out that in Table 5 of ISIRI 6219, the letter "Heh" is not shown in its separate form; whether it is shown as HEH+ZWJ (as Behnam says) is a speculation (the document itself does not provide evidence that it is HEH+ZWJ either), and that its footnote number 1 (right after the table) states that when there are multiple ways to produce the same glyph, it is preferred to use the form that uses only a single Unicode character (so if both HEH+ZWJ and HEH DOCHASHMEE are options, the latter is preferred).

With that said, I am okay with either choice, i.e. if we keep it as HEH+ZWJ it is totally fine too, as far as I am concerned.

r12a commented 6 years ago

I might as well add, at this point, that Unicode's CLDR doesn't list HEH DOACHASHMEE as a character used in Persian, either (see https://www.unicode.org/cldr/charts/latest/summary/fa.html)

r12a commented 3 years ago

Here's an attempt to summarise where i think we are with this thread:

  1. @Huji proposes to change \627 to \627\644\641 for the first value. It would be good to have some evidence of this usage so we can feel confident in making the change.
  2. We confirmed that fixed is the right type for persian-abjad and persian-alphabetic styles.
  3. We believe that counter 5 in the alphabetic style is ه‍ [U+0647 ARABIC LETTER HEH + U+200D ZERO WIDTH JOINER]
  4. There appears to be a possibility that 11, 12, and 13 should be یا‍ and یب and یج.
  5. @shervinafshar 's image shows that in the abjad style 13 could be مـ [U+0645 ARABIC LETTER MEEM + U+0640 ARABIC TATWEEL]

We're still awaiting further information related to points 1, 4, and 5 before making changes to the doc.

shervinafshar commented 3 years ago

1: is no go, as far as I can assess. 4: I think I missed the resources elaborating on this possibility. May I ask for the link again, @r12a? 5: is correct per resources at hand. Is there a suggestion to do more research?

r12a commented 3 years ago

4: I think I missed the resources elaborating on this possibility. May I ask for the link again, @r12a?

hi @shervinafshar, this comes from your comment above https://github.com/w3c/predefined-counter-styles/issues/23#issuecomment-396855220

Abjad might end up tricky; some sources (much less credible than Adib-soltani) mention that 11, 12, and 13 should be یا‍ and یب and یج. I'm still researching this.

shervinafshar commented 3 years ago

Thanks. Sorry to miss that. I'll do some research and get back in a week or so.

shervinafshar commented 3 years ago

I checked two implementations of abjad numbered lists (Polyglossia and XePersian) and both confirm my suspicion that 11 and above should be constructed:

11 یا 20 ک 30 ل 31 لا 52 نب ...

I generated two PDFs (Polyglossa, XePersian) with the tabulation from both packages.

It should be noted that these packages have other issues in generating abjad numbered lists which is not in the scope of this issue but it's worthwhile to be pointed out; e.g. XePersian uses آ rather ا in position 1 which ends up in oddities like لآ for 31; Polyglossia uses ي in place of ی and ك in place of ک.