interscript / interscript-ruby

Interoperable script conversion systems (ISCS) with the `interscript` gem
Other
11 stars 30 forks source link

Provide feedback about ISO 233-1 (Arabic transliteration) to ISO/TC 46/WG 3 #743

Open ronaldtse opened 3 years ago

ronaldtse commented 3 years ago

The ISO committee that created ISO 233-1 wants to know what problems we encountered in implementing 233-1, e.g. ambiguous target characters.

Do we have a list of decisions made and a list of open issues?

AhMohsen46 commented 2 years ago

Hello @ronaldtse

I've been reviewing it over the last 2 days

1-I searched for any open/closed issues we had for 233, I found that 233-1 had no issues open, only 233-3 had this issue https://github.com/interscript/maps/issues/56

2-started reviewing the available docs:

doc A) http://transliteration.eki.ee/pdf/Arabic_2.2.pdf

I- we can see here that ة (ta' marboota) is always transliterated to in iso 233-1984 image

can be found in: https://github.com/interscript/lcs/blob/3f1ea91afe73bd7bbcb2f1d703df47e10918c2f6/maps/maps/iso-ara-Arab-Latn-233-1984.imp#L103-L104

II-sukkun is not omitted, is ْ an english/latin character? image image we anyway are omitting it here: https://github.com/interscript/lcs/blob/3f1ea91afe73bd7bbcb2f1d703df47e10918c2f6/maps/maps/iso-ara-Arab-Latn-233-1984.imp#L67 and here: https://github.com/interscript/lcs/blob/3f1ea91afe73bd7bbcb2f1d703df47e10918c2f6/maps/maps/iso-ara-Arab-Latn-233-1984.imp#L83-L84

doc B)http://www.eki.ee/wgrs/rom1_ar.pdf (which is probably what we used building the map) not alot of comments on this one, it's not showing alot of rules for iso:233-1984 though, as it's mainly UN-2017, and showing differences from other systems (233, SES-1930, etc)

AhMohsen46 commented 2 years ago

created a new PR to match the doc provided: https://github.com/interscript/maps/pull/169 we may postpone merging it till we hear back from iso

notes on the new doc:

table (recommendations per letter order in letters' table) 1-ا (alef) how do we decide between ' and ’ 2-not sure about ء ` (02c8), in the original doc we used, it was ’ (2019) 5-ث (tha') IMO should be pronounced as (th), am not sure if (ṯ) sounds the same 6-ج maps to j, but it can be g accent, however, original arabic, it's pronounced as J 14-‫ش‬ maps to (sh) as in "short/shore/shall/should" 30-sub "\u064b", "á" # ً should be tranlisterated to an

31- sub "\u064c", "ú" # ٌ should be transliterated to un

32- sub "\u064d", "in" # ٍ should be transliterated to i/en

33- sukun should be omitted instead of °

34- shadda should double the letter? instead of ̄

NOTES: 4.6) Note 3)should we consider sun/moon letters?, example: الشمس -> aš šams

Note 4)should we consider ta' marbuta at the end sentences, like, when it's in the end of the sentence "ذهب الرجل الى المدينة" the man went to the city in this case, ta' marbuta should be pronounced as ah (madiynah)

and in the end of names/definite articles: غزة, المدينة المنورة where it should be pronounced as ah as well?

ronaldtse commented 2 years ago

We have provided feedback but the team has not responded yet. We will be engaging with them soon.