kcobra / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Error Reading dot matrix characters eg. A #1374

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. run tesseract with language English on attached images
2.
3.

What is the expected output? What do you see instead?
I expect the alphabet to be recognized.
Some of the letters, specially A is being misrecognized

What version of the product are you using? On what operating system?
latest version from git, under msys2

Please provide any additional information below.

Please see https://groups.google.com/forum/#!topic/tesseract-ocr/p1mVKUlKujY
for the original user report and discussion

Original issue reported on code.google.com by shreeshrii on 5 Nov 2014 at 2:14

Attachments:

GoogleCodeExporter commented 9 years ago
Problem is in font type:
   convert -blur 0x2 -depth 4 +dither -colors 16 -resize 50% AL2.png al3a.png
   tesseract al3a.png - -psm 7 get.image

You will get:
RBCDEFGHIJKL

So only "A" is problem (after correct preprocessing. If you look at (binarized) 
image that tesseract use for OCR (tessinput.tif) looks like "R"...

Original comment by zde...@gmail.com on 12 Apr 2015 at 8:16

Attachments: