oliveiracwb / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Tesseract 3.0.2 returns empty string, scaling the image makes it work. Included sample images #1408

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run tesseract using the included edl.traineddata against any of the included 
images
2. See that for images in images_become_empty_string.zip returns an empty 
string and for images in images_only_return_first_char.zip only returns the 
first digit.

What is the expected output? What do you see instead?
I expect to get the string representing the numbers clearly seen in the images. 
I get an empty string or only the first digit (see the attached .zip files)

What version of the product are you using? On what operating system?
Tesseract 3.0.2 on windows (using it from .Net with the tesseract wrapper from 
nuget)

Please provide any additional information below.
It works correctly in 99% of cases (the included images represent the 1% that 
does not work). There does not seem to be any obvious difference between images 
that work and images that hit this bug. Making small adjustments to the pixels 
in the included images or just scaling them +50% or sometimes +100% makes them 
work.

Original issue reported on code.google.com by ostby.e...@gmail.com on 27 Jan 2015 at 11:39

Attachments:

GoogleCodeExporter commented 9 years ago
1. Issue tracker is for language datas provided by Google (for custom training 
use tesseract user forum)
2. I just checked few files and IMO they have a wrong information about DPI.
3. Some of the files (e.g. edl.elite_font.exp0.tif, edl.elite_font.exp5.tif) 
provide expected OCR result with English and correct PSM
4. If I corrected some of files (e.g reasonable letter size for 300 DPI) I got 
expected result - see attachment. So it looks like you should focus more on 
preprocessing of images.

Original comment by zde...@gmail.com on 13 Apr 2015 at 8:20

Attachments: