shabados / gurmukhi-utils

Utilities library for converting, analyzing, and testing Gurmukhi strings.
MIT License
30 stars 9 forks source link

Transliteration development - getting rid of extra a's #186

Closed preetcharan closed 1 year ago

preetcharan commented 3 years ago

Describe the bug Extra a's coming up in words of transliterations, have given a couple of examples below but if more needed let me know.

To Reproduce Steps to reproduce the behavior: Search: kmkp (kaljug meh keertan pardhaanaa

Translit that you get: kalajug meh keeratan paradhaanaa |

What I would like: Get rid of the extra a's

Change to: kaljug meh keertan pardhaanaa |

Another example:

guramukh japeeai laae dhiaanaa |

change to:

gurmukh japeeai laae dhiaanaa |

Specs

bhajneet commented 3 years ago

What rule can you use to get rid of the extra characters? If it's something you can explain to a 5 year old we can probably program it in.

preetcharan commented 3 years ago

I think somewhere there is a rule that is adding in an "a" after certain characters. So removing thst if you know the pattern or know the rules should be easy. Shall I give more examples to help you see the pattern?

On Thu, 21 Jan 2021 at 12:13, Bhajneet S.K. notifications@github.com wrote:

What rule can you use to get rid of the extra characters? If it's something you can explain to a 5 year old we can probably program it in.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/shabados/gurmukhi-utils/issues/186#issuecomment-764602239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJY6O2FT5JZXSS22WCPUU5DS3AK57ANCNFSM4WMXYZJQ .

bhajneet commented 3 years ago

Issue is if you remove all a then you will get things like

Stnaam, naank, prsaad

How does it know when to add the a between two consonants and when not to?

It has to do with compound words which we could go through our Gurbani and add hyphens for transmit purposes which then get stripped out but that's a lot of work and a bit odd

What do you think @Harjot1Singh and @Sarabveer ?

bhajneet commented 3 years ago

note to self: test a list of 4 letter words with no vowel between consonants 2 and 3.

bhajneet commented 1 year ago

Better served with a syllabification function and manual handling of syllable boundaries using the interpunct in DB