tesseract-ocr / tessdoc

Tesseract documentation
https://tesseract-ocr.github.io/tessdoc/
1.85k stars 364 forks source link

Prerequesites and what are the configuration need to do OCR #20

Open karunakarthadkapally opened 4 years ago

karunakarthadkapally commented 4 years ago

we are using tesseract 4.0.0. While doing OCR through Linux command "tesseract pan.jpg stdout" getting the better result. But when we integrated tesseract logic in java application it is not giving proper results. But in the same project working fine windows machine. We have already set the TESSDATA_PREFIX environment variable. And in both environments, we have the latest eng.traineddata only. Please find the sample code below.

try{ Tesseract instance = new Tesseract(); instance.setDatapath("/usr/share/tesseract/"); File file = new File("/home/projectr/pan.jpg"); instance.setLanguage("eng");

String result = instance.doOCR(file); System.out.println(result); } catch (Exception e) { e.printStackTrace() }

If possible send a sample java project which will run on the Linux environment with prerequisite in Linux machine and anything needs to change in any config file.

we are using Linux version 3.10.0-693.el7.x86_64

below are the tesseract version details in Linux machine. tesseract 4.0.0 leptonica-1.77.0 libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7

stweil commented 4 years ago

Tesseract 4.0.0 is unsupported. Please use a newer version, either Tesseract 4.1 or latest Tesseract from git.