Open brainsucker-na opened 1 week ago
I don't know any other OCR software which needs DPI information. Ideally Tesseract should work without it, too. Code contributions for this goal are welcome, but must make sure that there is no regression of course.
Current Behavior
With --dpi 300 Tesseract produces the following mediocre results (misses a couple words) for attached sample image:
sample_crop.zip
Full command line:
tesseract.exe -l rus --dpi 300 sample_crop.png crop300dpi
Expected Behavior
With --dpi 299 for the same image Tesseract produces much better results:
Full command line:
tesseract.exe -l rus --dpi 299 sample_crop.png crop299dpi
I would expect it to perform at similar level at --dpi 300.
Suggested Fix
No response
tesseract -v
tesseract v5.4.0.20240606 leptonica-1.84.1 libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.1) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.3 : libwebp 1.4.0 : libopenjp2 2.5.2 Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.7.4 zlib/1.3.1 liblzma/5.6.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.6
UB Mannheim build (setup) with default tessdata files
Operating System
Windows 10
Other Operating System
No response
uname -a
No response
Compiler
No response
CPU
i7-4700MQ
Virtualization / Containers
No response
Other Information
No response