virtualvinodh / aksharamukha

Aksharamukha
153 stars 39 forks source link

Preserve source - Thaana & Perso-Arabic consonants #142

Closed GokulNC closed 2 years ago

GokulNC commented 2 years ago

Is it possible to preserve source for Dhivehi & Urdu consonants which are of Perso-Arabic origin?

For your reference:

Thaana Perso-Arabic Devanagari Roman
ޙ‎ ح ह़
ޜ ژ झ़ ž
ޘ‎ ث थ़ / स़़ ṯ / s̱
ޛ‎ ذ द़ / ज़़ ḏ / ẕ
ޞ‎ ص स़
ޟ‎ ض‎ ध़ / ॹ ḋ / ż
ޠ‎ ط‎ त़
ޡ‎ ظ ध़़ / ॹ़ d̤ / z̤
ޥ व़ w

Please note that:

I might have missed something, please feel free to implement it the way you see it right. Thanks!


Edit: Suggestions for Thaana--

virtualvinodh commented 2 years ago

@GokulNC Thanks for this!

Isn't there an established way to uniquely (as in one-to-one) transcribe Urdu to Devanagari (that doesn't use multiple Nuktas)? I am sure somebody should have done standardized mapping.

V

GokulNC commented 2 years ago

So far I haven't come across any standardizations/proposals to uniquely map all Perso-Arabic consonants to Devanagari. I think this is mainly because there is no necessity to do that. Only when it comes to lossless transliteration (for computational purposes), it might make sense to have such a map.

For example, there are 4 graphemes in Persian & Urdu (ز , ذ , ض , ظ) for the same phonetic (IPA /z/) , for which the Devanagari "ज़" is sufficient. Similarly for स, त, ह.

Some of the above consonants in table (ह़/ح, झ़/ژ, थ़/ث, स़/ص, त़/ط‎) comply with the latest ISO 15919 Indic Transliteration standard.


There seems to have been some established mapping for Hindustani which I came across in the book: The Hindee-Roman Orthoepigraphical Ultimatum (See pages cxxxvi & cxxxvii)

This features the double-nuqta स for ث (स़़ - Shows 2 nuqtas in some fonts) I am not sure about the other Nagari characters listed for the unique mapping (does not seem to be in modern Devanagari)

GokulNC commented 2 years ago

For Mahl Dhivehi (of Lakshadweep), I found this mapping for Devanagari: https://dv.wikipedia.org/wiki/ވިކިޕީޑިއާ:How_to_read_Devanagari_script (See additional consonants section in the image)

Not sure of its origin and authenticity.

virtualvinodh commented 2 years ago

Fixed. Will be available in the next update.

I had to make up my own mapping a bit but this should work.

You have to enable preserve script to access the extra nuktated consonants.

V