Open codito opened 4 years ago
This code can be imported into https://github.com/sanskrit-coders/indic_transliteration/tree/master/indic_transliteration/font_converter module.
Looks like indic2unicode is now available in python 3. I've raised an issue in sudhAnt's repo.
Regarding the accent recognition issue:
the string ÊSÉSÉjÉÉè VÉMÉMÉiÉÉä
should be translated as चि॒त्रौ जग॑तो
, rather than as चिचत्रौ जगगतो
. So, essentially, ÉS ÉM
should be replaced with ॒ ॑
svaras respectively beforehand.
This naive technique did not work:
def convert_handling_svaras(self, text):
text = regex.sub("ÉS", "॒", text)
text = regex.sub("ÉM", "॑", text)
out_text = self.convert(text=text)
out_text = regex.sub("([॒॑])([ा-ॏऀ-ः])", "\\2\\1", out_text)
return out_text
Yielded स्ा॑मैगक्षिष्योचध्ध्वम्ा॑हस स्ा॑मैगक्षिष्योचध्ध्वम्ा॑हसआदिचत्येन्ा॑
Here's a script that attempts to convert text in Surekh font to Unicode: https://gist.github.com/codito/cb31ba37b0a4e5a77dc03c84a3ebc50d
Underlying library is here: https://github.com/sushant354/indic2unicode
Tasks
indic2unicode
package to Python 3आदिचत्येनग सचहीयगसा
should beआदि॒त्येन॑ स॒हीय॑सा
Sanskrit programmers email thread for reference: https://groups.google.com/forum/#!msg/sanskrit-programmers/erYjhaqAciQ/Yha8ho6QAQAJ