w3c / alreq

Documenting gaps and requirements for support of Arabic and Persian on the Web and in eBooks.
Other
62 stars 31 forks source link

Missing terms in glossary #130

Closed r12a closed 5 years ago

r12a commented 7 years ago

Page 372 of Unicode core text defines a couple of terms which i was surprised to find missing from the glossary of alreq:

ijam tashkil

Maybe we should add those, with a condensed description per Unicode.

(Also, there's a gap in the table after mabsut which needs to be fixed.)

r12a commented 7 years ago

Actually, harakat and tanwin are also missing. They may also be useful.

Not sure whether we need to add terms like ezafe (Urdu izāfat).

behnam commented 7 years ago

A few more terms from Arabic-script Paleography, which I'm not sure if we want to get to, but good to keep track of here:

From https://ar.wikipedia.org/wiki/%D8%A5%D8%B9%D8%AC%D8%A7%D9%85:

ntounsi commented 7 years ago

@r12a , Ok to add the following terms to the glossary

Ijam : Diacritical marks applied to a basic letter shape (or skeleton) to derive a new letter. For example a dot under a "curve" to get the letter Bah.

Tashkil : Marks that are added to letters to indicate vocalisation of text or to correct pronunciation.

Harakat : Some (basic) of these vowel marks.

Tanwin : (Derived from Noon). An extra Noon pronounced at the end of a word, and indicated by doubling the sign of one of the diacritics Fatha or Damma or Kasra.

(Also, there's a gap in the table after mabsut which needs to be fixed.)

I'll fix it at the same time.

ntounsi commented 7 years ago

"Tashkil" is mentioned as "diacritical marks".

r12a commented 7 years ago

@ntounsi here are some suggested alterations and a couple of additions. I think it's useful to mention what are and are not combining characters in Unicode (particularly in case anyone with an IDN background is reading this).

Ijam : Diacritical marks applied to a basic letter shape (or skeleton) to derive a new letter. For example a dot under a "curve" to get the letter BEH. In Unicode each letter plus ijam combination is encoded as a separate, atomic character.

Tashkil : Marks that are added to letters to indicate vocalisation of text or to correct pronunciation. In Unicode these are all combining characters applied to a base character.

Harakat : Tashkil marks representing short vowel sounds.

Tanwin : (Derived from Noon). Tashkil marks indicating postnasalized or long vowels at the end of a word, and indicated by doubling the sign of one of the harakat diacritics.

Shadda: A tashkil mark indicating gemination of the base consonant.

Sukun: A tashkil mark indicating the lack of a vowel after the consonant to which it is attached.

wdyt?

ntounsi commented 7 years ago

Looks good to me.

behnam commented 7 years ago

I'm proposing to add:

behnam commented 7 years ago

@shervinafshar, the spreadsheet is read to be published again. Thanks!

ntounsi commented 7 years ago

I then count on @shervinafshar for publication.