virtualvinodh / aksharamukha

Aksharamukha
161 stars 41 forks source link

Tamil Extended Vl (LLi) CVC combo #246

Closed nsesha92 closed 9 months ago

nsesha92 commented 1 year ago

kLLi should be equal to കഌ, like ka = ക etc but kaLLi is mapped to കഌ same for kh, g, ... h all pure consonants with vowel sign LLi combo.

GokulNC commented 1 year ago

This issue seems to be across all Indic scripts.

e.g. ITRANS paLLi to Tamil gives பலுʼ, instead it should have been பள்ளி

Suggestion: It would be better to add an option to enable Vedic transliteration (defaulting to false), since ऌ, ॡ and ॠ are not used in any other languages.

GokulNC commented 1 year ago

Also, v^eRRi transliterates to வெருʼ instead of வெற்றி.

So the above option could be applicable for Sanskrit ऋ as well

nsesha92 commented 1 year ago

ऌ, ॡ, ॠ, and ॠ are USED in ALL other Indic languages, other than Tamil. And they are NOT Vedic.

As for பள்ளி and வெற்றி, I never used/tested these words and the first time I realised the ITRANS coding problem. Aksharamukha cannot fix this issue and ITRANS is not going to modify the coding scheme.

But you can use the escape sequence to get the right word in Tamil:

paL_Li v^eR_Ri

Use underscore "_" for no-output (row 157 in itrans 5.3 spreadsheet).

Azhagi+ coding uses lux, Lux, rux, and Rux and hence this issue does not crop up.

GokulNC commented 1 year ago

Thank you @nsesha92 .

you can use the escape sequence to get the right word

Interesting, did not know this.

Would be great if a post-process option is available for ITRANS output to explicitly escape such ambiguous cases. (we may even call it "preserve source". This is required so that we can take the romanized text and convert it back to the native script without any discrepancy to the original text)

virtualvinodh commented 9 months ago

I have made the outputs of பள்ளி and வெற்றி to be paL_Li and v^eR_Ri.

Basically, if there is a vowel before the sequence RRi and LLi, it becomes R_Ri and L_Li.

I will push this in the next update.

nsesha92 commented 9 months ago

Thanks, thats a fantastic workaround/fix for paLLi= பள்ளி, veRRi = வெற்றி and other such RRi, LLi combos. Please extend the same logic to any LLI and RRI also.

But the bug I originally reported is for kLLi vs kaLLi, as opposed to kRRi and kaRRI for Tamilext

ka | ക | க | क kA | കാ | கா | का ki | കി | கி | कि kI | കീ | கீ | की ku | കു | கு | कु kU | കൂ | கூ | कू kRRi | കൃ | க்ருʼ | कृ kaRRi | കഋ | கருʼ | कऋ kRRI | കൄ | க்ரூʼ | कॄ kaRRI | കൠ | கரூʼ | कॠ

ke | കെ | கெ | कॆ kE | കേ | கே | के kai | കൈ | கை | कै ko | കൊ | கொ | कॊ kO | കോ | கோ | को kau | കൌ | கௌ | कौ

Highlighted above, correct in Malayalam/when copy pasted. But perhaps not correct in ak for Tamilext. May be I am confused.

Will the proposed fix will take care of the above?