tesseract-ocr / langdata

Source training data for Tesseract for lots of languages
Apache License 2.0
826 stars 886 forks source link

Santali Language (Ol Chiki script) OCR #153

Open Prasanta-Hembram opened 4 years ago

Prasanta-Hembram commented 4 years ago

Hello everyone!!!! I am new to coding but when i came to know about Tesseract i thought lets have a try, i have also same issue like Balinese Script OCR #152 but in my case i use jTessBoxEditor 2.2.1 and i have Noto sans Ol Chiki as main Unicode font. In fact this language has many Unicode font. I have followed Indic-ocr but unable to contact them that how they created and trained Santali language, also they have not mentioned sat.traineddata version. I tried to search langdata in all respository but found none. I have tried to train this language but getting too many error. What is the best error free way to train this language.

Fonts list :https://github.com/indicocr/tessdata/tree/master/sat