Preserve source - Thaana & Perso-Arabic consonants

GokulNC commented 2 years ago

Is it possible to preserve source for Dhivehi & Urdu consonants which are of Perso-Arabic origin?

For your reference:

Thaana	Perso-Arabic	Devanagari	Roman
ޙ‎	ح	ह़	h̤
ޜ	ژ	झ़	ž
ޘ‎	ث	थ़ / स़़	ṯ / s̱
ޛ‎	ذ	द़ / ज़़	ḏ / ẕ
ޞ‎	ص	स़	s̤
ޟ‎	ض‎	ध़ / ॹ	ḋ / ż
ޠ‎	ط‎	त़	t̤
ޡ‎	ظ	ध़़ / ॹ़	d̤ / z̤
ޥ	व़	w

Please note that:

Some Devanagari consonants are approximately made-up to enable one-to-one map (some with multiple nuqtas, though it may not be separately visible in default font)
Some phonemes are transcribed differently in Arabic (as in literary Dhivehi) and Persian (as in Urdu)

I might have missed something, please feel free to implement it the way you see it right. Thanks!

Edit: Suggestions for Thaana--

For diphthongs (like ޤައުމަކަށް -> qa:umakaṣ), please use the U+A789 colon instead of the regular colon.
For word final /އް‎/, please use U+02BE instead of ʔ

virtualvinodh commented 2 years ago

@GokulNC Thanks for this!

Isn't there an established way to uniquely (as in one-to-one) transcribe Urdu to Devanagari (that doesn't use multiple Nuktas)? I am sure somebody should have done standardized mapping.

V

GokulNC commented 2 years ago

So far I haven't come across any standardizations/proposals to uniquely map all Perso-Arabic consonants to Devanagari. I think this is mainly because there is no necessity to do that. Only when it comes to lossless transliteration (for computational purposes), it might make sense to have such a map.

For example, there are 4 graphemes in Persian & Urdu (ز , ذ , ض , ظ) for the same phonetic (IPA /z/) , for which the Devanagari "ज़" is sufficient. Similarly for स, त, ह.

Some of the above consonants in table (ह़/ح, झ़/ژ, थ़/ث, स़/ص, त़/ط‎) comply with the latest ISO 15919 Indic Transliteration standard.

There seems to have been some established mapping for Hindustani which I came across in the book: The Hindee-Roman Orthoepigraphical Ultimatum (See pages cxxxvi & cxxxvii)

This features the double-nuqta स for ث (स़़ - Shows 2 nuqtas in some fonts) I am not sure about the other Nagari characters listed for the unique mapping (does not seem to be in modern Devanagari)

GokulNC commented 2 years ago

For Mahl Dhivehi (of Lakshadweep), I found this mapping for Devanagari: https://dv.wikipedia.org/wiki/ވިކިޕީޑިއާ:How_to_read_Devanagari_script (See additional consonants section in the image)

Not sure of its origin and authenticity.

virtualvinodh commented 2 years ago

Fixed. Will be available in the next update.

I had to make up my own mapping a bit but this should work.

You have to enable preserve script to access the extra nuktated consonants.

V

virtualvinodh / aksharamukha

Preserve source - Thaana & Perso-Arabic consonants #142