interscript / maps

Script conversion maps for Interscript
2 stars 1 forks source link

Issues in ODNI Arabic system 2015 (ICS-630-01 Annex A) #284 #57

Open AhMohsen46 opened 3 years ago

AhMohsen46 commented 3 years ago

https://github.com/interscript/interscript/issues/284 in the document attached, 1- a)defined article such as "عبدالرحمن" Abd-al-Rahman is sometimes having an Hyphen as a separation as here image and sometimes it's not like here image

b)Also, I'd like to know if there is a rule to use a hyphen within the name? like الجُمهورية should this be Al Jumhuriyah or Al-Jumhuriyah

c)for this part of the Hyphen special rule: names marked by the lineage/family marker “Al” (e.g., Al Thani) are not hyphenated I think this will be more into ML; no rule can sum them all I guess

image

2-letters that have the same transliteration as "ص,س" > "s" "ت,ط" > "t" "د,ض" > "d" are these transliterated to the same letter?, or is there any symbols like underlines for example but are not shown correctly in the encoding? I know this is not a high priority but need to double confirm

ronaldtse commented 3 years ago

For b) since Special Rule 1 says "names marked by the lineage/family marker “Al” (e.g., Al Thani) are not hyphenated", I assume this is "Al Jumhuriyah". Right?

I think this will be more into ML; no rule can sum them all I guess

Is this because the "Al" in "Al Thani" (اَل ثاني) and "Al Jumhuriyah" (الجمهورية) cannot be distinguished?

ronaldtse commented 3 years ago

Clarifications requested from BGN.

AhMohsen46 commented 3 years ago

@ronaldtse Al Jumhuriyah means the republic; but it's also started by al, so, if the rule is applied to them all, it's okay, but if not, it'd be hard for normal maps to differentiate between them, as there is nothing like a unique property of these nouns to differentiate between them

ronaldtse commented 3 years ago

Received clarification:

General:

[...] strong preference/recommendation would be that any automated transliteration tool [..] adhere to the Romanization standards adopted by BGN/PCGN. The ODNI system in particular leaves a lot of room for potential confusion — the two instances [...] are prime examples.

  1. Hyphenation

BGN does not hyphenate these structures at all. Arabic and its Romanization vary greatly depending on the country/region and the organization conducting the transliteration. In my opinion, uniformity is one of the greatest challenges in automating this process. [...] I would recommend strict adherence to the BGN standard on this occasion, thus avoiding all hyphens and mitigating any potential confusion.

  1. Overlapping target characters

These letters do make similar sounds to their associated Latin equivalents. For this reason, we use diacritics to distinguish between د ض and ذ, all of which roughly make a “d” sound. Our romanizations for these letters are ḍ, d, and dh, respectively. Diacritics are crucial for letter distinction if one-to-one correspondence is to be maintained for reverse transliteration purposes.

So we will:

  1. Not handle hyphenation, in accordance with the BGN Arabic system.
  2. Utilize the described diacritics to differentiate the transliterations.

@AhMohsen46 are we all set? Thanks!

AhMohsen46 commented 3 years ago

Thanks for the clarification Sir! 1-Removed all Hyphenation 2-does this apply to all of these, they are all with under-dots in BGN? "ص,س" > "s" "ت,ط" > "t" "د,ض" > "d"

so they be like that? 'ṣ' # ص 'ḍ' # ض 'ṭ' # ط 'ẓ' # ظ

AhMohsen46 commented 3 years ago

https://github.com/interscript/interscript/pull/646

this includes the fixes along with ODNI Arabic 2004 taking into consideration these 'ṣ' # ص 'ḍ' # ض 'ṭ' # ط 'ẓ' # ظ

is they are not like that, please let me know and I will push it back to what it was, but I think it is, mapping the case in 'ḍ' # ض to the others, since they are all roughly same pronunciation

ronaldtse commented 3 years ago

@AhMohsen46 clarification sought. Thanks!

ronaldtse commented 3 years ago

@AhMohsen46 it is clarified that all ODNI systems do not use diacritics, so this is correct: "ص,س" > "s" "ت,ط" > "t" "د,ض" > "d"