virtualvinodh / aksharamukha

Aksharamukha
159 stars 41 forks source link

Word-final Schwa Deletion if preceeded by a half-consonant #133

Closed GokulNC closed 2 years ago

GokulNC commented 3 years ago

If there is a half-consonant before the final consonant, the final consonant must not be schwa deleted.

For example, words like "मध्य" (as in Madhya Pradesh) should be transcribed as "madhya", but currently it becomes "madhy". Similarly for राज्य (raajya), रत्न (ratna), etc.

It works good words like for वस्त्र (I don't know how).

However, for words like वृक्ष (vrksh), the final schwa should be deleted, considering क्ष as a single consonant instead of a ligature. Also note, schwa should not be deleted for final consonants preceeded by anusvara, like हिंद (hind).

I'm not sure if this is applicable only for Hindi or other languages too like Gujarati, Punjabi, etc.


Also below are some failure cases I'm not sure how it should be handled (since I do not know Hindustani grammar):

  1. गर्भवती - garbhavatī (Currently transcribed as garbhavtī, but works well for सरस्वती )
  2. हिन्द - hind (Currently transcribes correctly, but if above rules are incorporated, it will fail. Though the correct form is हिंद , misspellings like these are a bit common these days.)
  3. समाप्त should be "samaapt" and अन्त should be "ant", but अन्य should be "anya". (I think these inconsistencies are because of Sanskritization of Hindustani in making it Hindi)

Will edit and add more failure cases as-&-when I come across. Thanks!

vbharadvaja commented 3 years ago

I think the actual rules governing word-final schwa deletion/retention are more complex. This will explain the 'exceptions' as well. This is what I can tell:

  1. Any cluster ending in य or र retains the schwa.
  2. Any cluster ending in स श ष deletes the schwa.
  3. Geminated clusters (द्ध, च्छ, त्त, etc) delete the schwa. (Thanks to @GokulNC for pointing this out)

For the remaining cases, refer to below arrangement of letters:

ट त च क प श ष स न म य र ल व

(Voiced/aspirate forms may be considered in the same row as their unvoiced unaspirate counterparts. Other nasals and anusvara may be lumped into 5th row)

  1. If the two final letters in the cluster are in ascending order per the table (i.e. the latter letter in the cluster belongs to a higher row on the table), delete the schwa.
  2. If descending order, retain the schwa.
  3. If both are of the last row, delete the schwa.
  4. If both are otherwise of the same row, retain the schwa.

Earlier rules should take prominence over later rules. As far as I can tell, only the last two consonants in the cluster are relevant.

This should explain most the clusters that occur in Hindi. It will obviously fail on some clusters that don't occur in Hindi (at least afaik). Please feel free to add corrections or point out any exceptions that I have missed.

Also, हिन्द is the correct spelling. That is how it is pronounced. हिंद makes needless use of the anusvara, which has a technically different pronunciation. And the issue is definitely not one of Sanskritization, just one of feasibility in pronouncing complex clusters clearly.

GokulNC commented 3 years ago

Thanks a lot @vbharadvaja . Did not know this.

The ascending/descending order seems a bit confusing to me.

  1. In the case of महत्वपूर्ण (mahatvapurna), the row of र (first consonant in cluster) is below the row of न. In this case, if I assume it as descending order and retain the schwa, the transcription is correct.

  2. But in the case of अन्त (ant) also, the row of of न is below the row of त. Similar to (1) if I assume this as descending order, I'll have to retain the schwa, but the transcription would be wrong (anta).

Please let me know if I have misinterpreted something.

vbharadvaja commented 3 years ago

र्ण and न्त would both be ascending order. न is lower on table than त.

So it works for अन्त. I was under the impression the महत्वपूर्ण is mahatvapurn. I am not a native speaker of Hindi, so I may be wrong there (I will however note that Google returns 179k results for 'mahatavpurn' and only 83.4k results for 'mahatvapurna'. Google translate also renders the pronunciation as 'mehetvapurn'). But in the case that "mahatvapurna" is indeed the prevalent pronunciation, there are likely further nuances to it beyond what I've stated.

GokulNC commented 3 years ago

Thanks, even I am not sure since I am not a Hindi speaker too. But yes, the table helps in covering almost all Hindi words with final consonant-cluster.

Another peculiar example I came across when testing was बद्ध (as in असंबद्ध ) for which the IPA seems to be /bəd̪d̪ʱ/

Similarly, not sure if Hindi speakers would be reading बुद्ध as buddh or buddha. Also for शुद्ध . (Probably there should be some rule for aspirated-germination-like consonant clusters)

vbharadvaja commented 3 years ago

For words ending in द्ध, I have heard the schwa deleted regularly. So yes, an extra rule to account for geminated clusters is needed. Will update my original comment with that.

virtualvinodh commented 3 years ago

@vbharadvaja @GokulNC

Thanks for all this, guys. It will be great to implement these rules and improve the Schwa-deletion for Hindi.

V

virtualvinodh commented 2 years ago

@vbharadvaja Sorry. Just starting to close issues in Github.

Your arrangement of letters. Where do I start the numbering to decide the ascending or descending order? It looks like I should go from bottom to top. But just wanted to confirm if that's what is intended.

V

vbharadvaja commented 2 years ago

@virtualvinodh sorry for the late reply.

Yes, it should be bottom to top. I had meant ascending/descending with regards to the spatial position of the row on the screen. Numerically, bottom to top will capture the same.

virtualvinodh commented 2 years ago

This has been fixed. I'll push the update next week or so.