tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
62.69k stars 9.54k forks source link

Simple recognition does not work. #2939

Open stefanCCS opened 4 years ago

stefanCCS commented 4 years ago

Environment

On Windows 10(64bit): esseract v5.0.0-alpha.20191030 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found FMA Found SSE Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5

Current Behavior:

I have a single line image and I do a simple Tesseract call. But, Tesseract cannot find any text. My tesseract call: tesseract.exe --dpi 300 --psm 7 example.png example -l eng alto txt

<Remark: I have found out, that psm=13(raw) works fine>

The image itself is an "artificial one:" Written in Word, exported as PDF, converted with ImageMagick: convert -trim -density 300 -antialias example.pdf -quality 100 -colorspace GRAY -flatten example.png

example

Expected Behavior:

Text recognized

Suggested Fix:

?

zdenop commented 4 years ago

use psm 13 (Raw line).

stefanCCS commented 4 years ago

Well, on the one hand, it works with "raw line", but on the other hand, from user's perspective (which I am), I do not know anymore, when to use psm=7 (or psm=13) (in my example, where I have only a single line of text). Any explanation, which you can share?

GSATHYANARAYANA commented 4 years ago

we are giving a text image (which is below) to the eng.traineddata file, but we are got null as ouput please can you solve no-text this