indic-transliteration / sanscript.js

Transliteration package for Indian scripts
MIT License
98 stars 39 forks source link

Conversion from shaaradaa is broken #63

Open grosmar opened 1 year ago

grosmar commented 1 year ago

If you try to convert sharada to anything, it is broken. Prbobably the reason is that, the sharada characters are represented not only on one byte, but on 2 or more.https://github.com/indic-transliteration/common_maps/blob/master/brahmic/sharada.toml

image

I don't know if there is a sharada map which works with single bytes? if not, then the matchings should be changed from charAt to some more sophisticated (and problematic) solution: https://github.com/indic-transliteration/sanscript.js/blob/master/src/sanscript.js#L375

If the sharada consistenly made of 2 bytes, then it could be marked in the mapping, and use substring instead of charAt. If not, then we need to find if there is matching string in the mapping to the beginning of the examined string, and cut always from the beginning the found character