nguyenq / tess4j

Java JNA wrapper for Tesseract OCR API
Apache License 2.0
1.58k stars 372 forks source link

tesseract-ocr and tess result are different #259

Closed zymgg closed 7 months ago

zymgg commented 7 months ago

lanpic.zip

system: windows10+jdk17+idea tesseract-ocr:v5.3.3.20231005 tess4j:5.10.0 Successfully trained a new language using LSTM in windows windows: tesseract image27a.jpg output_2 -l num result:JT5246870293852 java:

        ITesseract instance = new Tesseract();
        instance.setDatapath("D:\\Tesseract-OCR\\tessdata");
        instance.setLanguage("num");
        String logistics = instance.doOCR(new File("F:\\2024-01\\18090241\\image2\\image27a.jpg"));
        System.out.println(logistics );

result: eee al 18H AAR** = 4 8}

System.setProperty("jna.library.path", "D:\Tesseract-OCR\"); this code debug can see, but result is always different

nguyenq commented 7 months ago

@zymgg We confirm your findings. When we loaded your image in VietOCR3, which uses Tess4J library, we got good results using either eng or num pack. You may want to step through VietOCR3's code execution for your investigation.

zymgg commented 7 months ago

@zymgg We confirm your findings. When we loaded your image in VietOCR3, which uses Tess4J library, we got good results using either eng or num pack. You may want to step through VietOCR3's code execution for your investigation.

thanks!!! i see source code find setPageSegMode default 3. Thank you for your help