tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.31k stars 9.41k forks source link

Tesseract can not recognize grey text in black background #4237

Closed RunzhongK-AI closed 5 months ago

RunzhongK-AI commented 5 months ago

Current Behavior

Tesseract can extract white text in black backgound, but not grey texts in black backgound

Expected Behavior

Extract both text in white color and grey color

Suggested Fix

No response

tesseract -v

tesseract 5.3.4 leptonica-1.84.1 libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.2.11 : libwebp 1.4.0 : libopenjp2 2.5.2 Found AVX512BW Found AVX512F Found AVX512VNNI Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.7.2 zlib/1.2.11 liblzma/5.4.4 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.5 Found libcurl/8.1.2 SecureTransport (LibreSSL/3.3.6) zlib/1.2.11 nghttp2/1.51.0

Operating System

No response

Other Operating System

Mac os

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

Here is my tesseract result: To ensure a seamless end-to-end customer experience, marketers have a wide range of tools and platforms at their disposal. However, our findings suggest that, on an average, less than fourin 10 organizations use such tools extensively to map and track customer interactions and touchpoints (see Figure 11).

Linda Ha, Global Customer Engagement & Loyalty Manager, lkea Ingka, says: output-onlinepngtools (10)

zdenop commented 5 months ago

Read and follow https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md#binarisation