Open GoogleCodeExporter opened 9 years ago
[deleted comment]
Hi.
Original tesseract arabic language file involves cube files for
tesseract-cube-ocr option. You can learn what is cube from above link
https://code.google.com/p/tesseract-ocr-extradocs/wiki/Cube
Cube option makes OCR slower but gives better result. But there isn't any
released tool for tesseract-cube-training yet. So we should do it manually.
Actually i am working on same topic. If i found something new, i will post it.
Original comment by e.velib...@gmail.com
on 2 Feb 2015 at 3:48
I changed "tessedit_ocr_engine_mode" from "1" to "0" to use only tesseract
engine. there were no errors anymore and in result file the words were
separated with spaces, but some word were missing! with Arabic .traineddata
there were no words missing. I figure out that the same lines in the config
file which solved the space problems, caused the missing problem!
I am working on config file to solve this.
Original comment by mrfarajp...@gmail.com
on 3 Feb 2015 at 6:59
Original issue reported on code.google.com by
mrfarajp...@gmail.com
on 20 Jan 2015 at 8:54