virtualvinodh / aksharamukha

Aksharamukha
161 stars 41 forks source link

Preserve khanda_ta during transliteration #206

Closed GokulNC closed 1 year ago

GokulNC commented 1 year ago

Is there a way to retain 'ৎ‌' instead of converting it to 'ত্' of the target script? For example, त़् in Devanagari, ṯ in ISO, etc.

Or rather, is there any rule in Bengali that governs when 'ৎ‌' should be used and when 'ত্' should be used?


This is essential for a lossless to-and-fro conversion. For example, "তত্ব" is always transliterated to "तत्ब", which in-turn is always back-transliterated as "তৎ‌ব".

(Interestingly, the post-processing option khandatabatova also does not seem to work, which I presume is a bug)

virtualvinodh commented 1 year ago

@GokulNC

/ৎ‌/ is just an allograph of /ত্/. AFAIK it is used pretty much to avoid /ত্/ as a grapheme. So, it will occur at word-final positions and, consonant clusters, which don't have proper conjunct forms. For e.g. উৎকল /utkala/. /tka/ doesn't have a special graphemic form, unlike clusters like ত্ন /tna/ or /tla/.

I don't know if Bangla Academy has any specific rules as such.

virtualvinodh commented 1 year ago

@GokulNC

I have added an option to highlight Khanda TA (similar to Malayalam Chillu letters)

Will push it this week.