Closed Shreeshrii closed 6 years ago
This applies only for 4.0 (not for 3.05).
Hello I'm a software engineering student and i use tesseract OCR engine in a university project. For persian language, traineddata which it's a file and it made by Training tesseract 4.00 and LSTM method, has a good result and output in Arial fonts but it doesn't have any good result in some specific fonts for persian. So the questions are : 1- did you use specific fonts like B Nazanin , B Roya or etc in Training Tesseract 4.00 with LSTM or not? 2- if they haven't used how can we use these fonts for getting better result? I prepared a text that all the cases of litrates have repeated for 10 or 15 or more than 15 times in this text. Also i used the process of training tesseract 3.05 for this text but i didn't get better and beneficial output. For achieving to a good result in persian in Tesseract OCR engine we need your experience and your help. Thanks for your attention Sincerely.
@aidinkrmz Please see https://github.com/tesseract-ocr/tessdata/issues/70 and post your reply there.
Did you test with the latest BEST farsi traineddata?
closing this, as ray will be updating langdata soon with new files.
Devanagari script languages in 4.00.00alpha have better accuracy using only LSTM engine rather than combined mode. Modify config file to use
tessedit_ocr_engine_mode 1
as default instead of 2.