Closed ronaldaug closed 2 years ago
It seems this repo isn't active or maintained.
That's correct, this repo is for the old Tesseract 3.05 and the legacy OCR recognizer. The more recent repository is https://github.com/tesseract-ocr/langdata_lstm. Should I move this issue to that repo?
Yes please, thanks @stweil .
@ronaldaug, do you want to prepare a pull request which adds shn
, maybe based on https://github.com/tesseract-ocr/langdata_lstm/tree/master/mya?
Ok, I'll prepare and send a pull request to "/tesseract-orc/langdata_Istm" based on mya and other languages.
@stweil Sorry for bothering you. Is this repo still active? I've created PR for Shan language. Do I have to train it by myself?
Yes, the repo is active. I also noticed your pull request, but had no time to review it up to now. Ideally Shan support and training should be done by someone who knows that language (so not by me).
Thanks for your quick response. Though I'm not very familiar to tesseract-ocr training process, I'll try it.
Could someone help me to add the Shan language in tesseract?
Shan language = https://en.wikipedia.org/wiki/Shan_language Language code = shn Shan Wiki = https://shn.wikipedia.org All Shan words (including IPA) = jsonfile Websites that are using Shan scripts = https://shannews.org/ , http://shanunicode.com/ Font = https://saosu-mp.github.io/font/PangLong/PangLong.ttf Shan syllable break = https://github.com/kwarm/syllable-break
Some Shan characters such as
င သ တ ထ ပ မ ယ ရ လ ဝ ႉ း ွ ု ူ ိ ီ ် ၊ ။
are similar to Myanmar (Burmese).Thanks in advance