w3c / alreq

Documenting gaps and requirements for support of Arabic and Persian on the Web and in eBooks.
Other
60 stars 31 forks source link

Is a talur yeh inside a lexical item followed by a space? #270

Open r12a opened 7 months ago

r12a commented 7 months ago

In the middle of the lexical items in the following excerpt from a dictionary the Kashmiri text uses the final or isolate form of KASHMIRI YEH, rather than a medial or initial form (which would have a circle below a simple base). The latin transcription sometimes shows a lexical item as two parts separated by a space, and sometimes there is no space. As this is hand-written text, it is not clear whether the final form is produced because there is always a space between two morphological items, or produced without a space. There are other locations in the dictionary that show similar situations.

Screenshot 2023-11-07 at 07 05 09

In Wiktionary there are also lexical items made up of two such parts, and a final KASHMIRI YEH is always followed by a space.

The following text implies that there should be a space.

277093718-eaa04278-e9f9-4610-86b8-f8c9a680bbfd

However, there is a discussion on the Unicode site about whether the final form should also be produced without a following space. This may affect the future implementation of palatalisation.

Can anyone clarify whether printed text should or should not have a space after the final form of KASHMIRI YEH?

r12a commented 7 months ago

A number of points perhaps worth mentioning:

  1. The choice of gol ye (medial form with circle below) vs taler ye (final swash form) coincides 100% with the place in the syllable where palatalisation takes place. Palatalisation as part of a syllable onset uses gol ye, while palatalisation after a syllable coda uses taler ye. Therefore, if a non-final syllable with a coda appears in a lexical item, taler ye will appear inside that lexical item.
  2. The pattern of gol ye for initial/medial forms of KASHMIRI YEH and taler ye swash forms for final/isolate forms is identical to the pattern found in FARSI YEH, which is the other yeh character in Kashmiri, used for glides and vowels in Kashmiri. That is, both types of yeh use sub-base ijam diacritics in initial/medial shapes, and swash forms for final/isolate shapes.
  3. Other lexical items which don't involve palatalisation are also sometimes split into parts where syllable codas have a final shape, eg. عَمَل کَرٕنؠ (amal karɨnʲ), دَگ دار (dag daːr), أش دَر (əʃ dar), etc. This is not exclusively a pattern related to palatalisation.
  4. Persian orthography has something similar in terms of lexical items containing final forms (although there is no space following, and this is just about the joining patterns). This tends to happen, for example, when writing suffixes, such as in خانه‌ها (xɒːne-hɒː), where the final form of heh in word-medial position is produced using a ZWNJ (zero-width non-joiner).
  5. It is often difficult in nastaliq text, where spaces are very thin and adjacent character glyphs often overlap, to tell whether a non-left-joining character is or is not followed by a space.
  6. Latin transcriptions are typically inconsistent about whether parts of the lexical item are or are not separated by a space, and so can't be relied upon. That said, many transcriptions do show a space between the parts of the lexical item.
r12a commented 6 months ago

An additional question would be: If a lexical item such as ۂسؠ تِنؠ /həsʲ tinʲ/ elephant won't fit completely at the end of a line, is it necessary to prevent a line break after the first KASHMIRI YEH, or is it ok to wrap the characters after the first KASHMIRI YEH to the next line?