Closed ciyushan closed 6 months ago
Duplicate of https://github.com/nguyenq/tess4j/issues/259
您可以尝试使用 Tess4J 库的 GUI VietOCR3 吗?
Thanks for the reply, through your prompt, I successfully got the result I needed using the api using VietOCR3-6.12.0 version, thank you very much for your reply!!
Duplicate of #259
The problem is not solved, I was wrong before, now I try to use VietOCR3 to debug, but I don't find any exceptions, but there is still cmd and the result of code recognition is not the same。
tessdata file:jslang.traineddata img:1.png tessdata.zip
Duplicate of #259
I suspect it's a difference in how code calls and cmd calls handle images, as I've found similar issues in other OCR frameworks. url:https://github.com/hiroi-sora/Umi-OCR/issues/272
@ciyushan Did you setPageSegMode
to 3
as the other poster did to fix their issue?
Please attach your test image. And can you highlight the discrepancy/difference as we see the two results look very similar?
win11 jdk 17 cmd:tesseract -v out: tesseract v5.3.0.20221222 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX512BW Found AVX512F Found AVX512VNNI Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0
tess4j.version :5.10.0
I trained a new Chinese library, using the new Chinese library, using tesseract-ocr recognition on the command line exactly as expected, but not so well in tess4j
The java code is as follows:
File imageFile = new File("E:\tess4j\src\main\resources\1.png"); ITesseract instance = new Tesseract(); instance.setDatapath("E:\tess4j\src\main\resources"); instance.setLanguage("jslang"); try { long startTime = System.currentTimeMillis(); String result = instance.doOCR(imageFile); System.out.println("Result:\n" + result); long endTime = System.currentTimeMillis(); System.out.println("Time is:" + (endTime - startTime) + " ms"); } catch (TesseractException e) { System.err.println(e.getMessage()); } out: 那时我惟一的希望, 就在这雷峰塔的倒掉。 后来我长大了, 到杭州, 看见这 破破烂烂的塔, 心里就不舒服。 后来我看看书, 说杭州人又叫这塔作 “保叔 塔” , 其实应该写作 “保 (左人右叔) 塔” , 是钱王的儿子造的。 那么儿 里 面当然没有白蛇娘娘了, 然而我心里仍然不舒服, 仍然希望他倒掉。
现在, 他居然倒掉了, 则普天之下的人民, 其欣喜为何如? 这是有事实可证
cmd:
tesseract 1.png result -l jslang
out:
那时我惟一的希望,就在这雷峰塔的倒掉。后来我长大了,到杭州,看见这 破破烂烂的塔,心里就不舒服。后来我看看书,说杭州人又叫这塔作“保叔 塔” , 其实应该写作 “保 (左人右叔) 塔” , 是钱王的儿子造的。 那么 , 里 面当然没有白蛇娘娘了,然而我心里仍然不舒服,仍然希望他倒掉。
现在, 他居然倒掉了, 则普天之下的人民, 其欣喜为何如? 这是有事实可证
The image and language library are the same, but the results are different, is there a conflict between the tess4j and tesseract versions? Or does tess4j have parameters to set extra? How can I make tess4j recognition work the same as tesseract? Can I provide relevant information and help me solve the problem?