tesseract-ocr / langdata

Source training data for Tesseract for lots of languages
Apache License 2.0
827 stars 886 forks source link

Devanagari script texts in non-Hindi languages OCR better with hin.traineddata #74

Closed Shreeshrii closed 6 years ago

Shreeshrii commented 7 years ago

Improve langdata and traineddata for Marathi, Nepali, Sanskrit etc Currently with traineddata as of 4.0.0-alpha, better results are achieved with hin.traineddata

Please see more details at

https://github.com/tesseract-ocr/tesseract/issues/729

Shreeshrii commented 6 years ago

This is no longer the case with best 4.0alpha traineddatas.