tesseract-ocr and tess result are different

zymgg commented 7 months ago

system: windows10+jdk17+idea tesseract-ocr:v5.3.3.20231005 tess4j:5.10.0 Successfully trained a new language using LSTM in windows windows: tesseract image27a.jpg output_2 -l num result:JT5246870293852 java:

        ITesseract instance = new Tesseract();
        instance.setDatapath("D:\\Tesseract-OCR\\tessdata");
        instance.setLanguage("num");
        String logistics = instance.doOCR(new File("F:\\2024-01\\18090241\\image2\\image27a.jpg"));
        System.out.println(logistics );

result: eee al 18H AAR** = 4 8}

System.setProperty("jna.library.path", "D:\Tesseract-OCR\"); this code debug can see, but result is always different

nguyenq commented 7 months ago

@zymgg We confirm your findings. When we loaded your image in VietOCR3, which uses Tess4J library, we got good results using either eng or num pack. You may want to step through VietOCR3's code execution for your investigation.

zymgg commented 7 months ago

@zymgg We confirm your findings. When we loaded your image in VietOCR3, which uses Tess4J library, we got good results using either eng or num pack. You may want to step through VietOCR3's code execution for your investigation.

thanks!!! i see source code find setPageSegMode default 3. Thank you for your help

nguyenq / tess4j

tesseract-ocr and tess result are different #259