Open AhMohsen46 opened 3 years ago
For b) since Special Rule 1 says "names marked by the lineage/family marker “Al” (e.g., Al Thani) are not hyphenated", I assume this is "Al Jumhuriyah". Right?
I think this will be more into ML; no rule can sum them all I guess
Is this because the "Al" in "Al Thani" (اَل ثاني) and "Al Jumhuriyah" (الجمهورية) cannot be distinguished?
Clarifications requested from BGN.
@ronaldtse Al Jumhuriyah means the republic
; but it's also started by al
, so, if the rule is applied to them all, it's okay, but if not, it'd be hard for normal maps to differentiate between them, as there is nothing like a unique property of these nouns to differentiate between them
Received clarification:
General:
[...] strong preference/recommendation would be that any automated transliteration tool [..] adhere to the Romanization standards adopted by BGN/PCGN. The ODNI system in particular leaves a lot of room for potential confusion — the two instances [...] are prime examples.
BGN does not hyphenate these structures at all. Arabic and its Romanization vary greatly depending on the country/region and the organization conducting the transliteration. In my opinion, uniformity is one of the greatest challenges in automating this process. [...] I would recommend strict adherence to the BGN standard on this occasion, thus avoiding all hyphens and mitigating any potential confusion.
These letters do make similar sounds to their associated Latin equivalents. For this reason, we use diacritics to distinguish between د ض and ذ, all of which roughly make a “d” sound. Our romanizations for these letters are ḍ, d, and dh, respectively. Diacritics are crucial for letter distinction if one-to-one correspondence is to be maintained for reverse transliteration purposes.
So we will:
@AhMohsen46 are we all set? Thanks!
Thanks for the clarification Sir! 1-Removed all Hyphenation 2-does this apply to all of these, they are all with under-dots in BGN? "ص,س" > "s" "ت,ط" > "t" "د,ض" > "d"
so they be like that? 'ṣ' # ص 'ḍ' # ض 'ṭ' # ط 'ẓ' # ظ
https://github.com/interscript/interscript/pull/646
this includes the fixes along with ODNI Arabic 2004 taking into consideration these 'ṣ' # ص 'ḍ' # ض 'ṭ' # ط 'ẓ' # ظ
is they are not like that, please let me know and I will push it back to what it was, but I think it is, mapping the case in 'ḍ' # ض to the others, since they are all roughly same pronunciation
@AhMohsen46 clarification sought. Thanks!
@AhMohsen46 it is clarified that all ODNI systems do not use diacritics, so this is correct: "ص,س" > "s" "ت,ط" > "t" "د,ض" > "d"
https://github.com/interscript/interscript/issues/284 in the document attached, 1- a)defined article such as "عبدالرحمن" Abd-al-Rahman is sometimes having an Hyphen as a separation as here
and sometimes it's not like here
![image](https://user-images.githubusercontent.com/34799755/94366124-2f35fa00-00d6-11eb-83e2-5474283a3f86.png)
b)Also, I'd like to know if there is a rule to use a hyphen within the name? like الجُمهورية should this be
Al Jumhuriyah
orAl-Jumhuriyah
c)for this part of the Hyphen special rule:
names marked by the lineage/family marker “Al” (e.g., Al Thani) are not hyphenated
I think this will be more into ML; no rule can sum them all I guess2-letters that have the same transliteration as "ص,س" > "s" "ت,ط" > "t" "د,ض" > "d" are these transliterated to the same letter?, or is there any symbols like underlines for example but are not shown correctly in the encoding? I know this is not a high priority but need to double confirm