virtualvinodh / aksharamukha

Aksharamukha
161 stars 41 forks source link

NasaltoAnsvaraIASTISO doesn't nasalize final anusvara for central Indo-Aryan langs #129

Closed GokulNC closed 2 years ago

GokulNC commented 3 years ago

For example, the Hindi word "हां" ("haan", meaning "yes") is ISO-transliterated as "hāṃ" even if NasaltoAnsvaraIASTISO flag is enabled. (BTW the flag name should be AnusvaraToNasalIASTISO right?)

For languages like Hindi, Punjabi and Gujarati, I think even the final anusvara should be nasalized always.

Is it possible to add a separate flag called FinalAnusvaraToNasalIASTISO ?

Edit:

I think this should be done only if the final anusvara is preceded by a dependent-vowel. Like कपड़ों (kapdon). In cases where it's preceeded just by a consonant, it should be retained as "ṃ". Like कार्यं (kaaryam) Not sure if this is applicable for all languages

virtualvinodh commented 3 years ago

I suppose I could do that. I'll mark it now as an enhancement.

virtualvinodh commented 3 years ago

@GokulNC Would it make sense, if people would use the Use tilde for nasalization flag? That makes sense for North Indic languages.

Pronouncing the final Anusvara is highly language dependant. In Bengali/Sinhala would be /ng/, Hindi /n/, and in South Indian languages it would be /m/. Then we are entering into the realm of transcription, and not transliteration.

We would have to add three to four options to satisfy every single language group.

V

GokulNC commented 3 years ago

Makes sense. BTW the backend seems to be down, not sure if that's just for me.

Also NasalTilde post_options seems to be missing for Python just like https://github.com/virtualvinodh/aksharamukha/issues/127


We would have to add three to four options to satisfy every single language group

You are right. I understand the difficulty that at the stage of post-processing the roman text, it would not make sense to look at what the source script is. But I think it would be helpful to add an option for handling them, although I understand that this slightly becomes more like transcription of the language than transliteration of the script. Use tilde for nasalization flag seems too generic and I think it might not help a non-native one differentiate between ṁ and ṅ.

Instead of a post-processing flag, it could be a preprocessing flag specific to source script (for Devanagari, Gurmukhi, Gujarati) called FinalAnusvaraToNasalIASTISO which would internally convert the anusvara to ङ so that it is properly captured in the output as ṅ . Would that make sense?

I am not sure about final anusvara in Eastern-Nagari and Sinhala scripts, if it is to be transcribed as ṁ or ṅ. (For South Indian languages, I think this might not be necessary)

virtualvinodh commented 3 years ago

I'll keep this open for now.

AFAIK in most South Asian languages, Anusvara becomes /ṅ/. I'll revisit this at some point to think how to implement this properly, once I fix other bugs/enhancements.

Also NasalTilde post_options seems to be missing for Python just like #127

I'm forking the python package to a different Github repository, to keep things clean and separate. It should be available once I push the project to Github sometime soon.

V

virtualvinodh commented 3 years ago

@GokulNC

Makes sense. BTW the backend seems to be down, not sure if that's just for me.

Is the backend working now? If not, could you please clear the cache and try again?

V

GokulNC commented 3 years ago

Yes it works now, thanks. Probably there was some old cached JS code, hence the api was returning a 500 status.