oliveiracwb / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

A very weird bug #1446

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I use the spanish traineddata to recognize the spanish word "CAJERO" (= 
cashier).
Tesseract recognizes the word without problems.

But if there are some asterisks beside the word it does not even recognize one 
character.

There are two bugs:

First: 
The surrounding rectangles for word (yellow) and line (red) are correct.
This means that Tesseract has recognized that there exist the characters of 
CAJERO beside the asterisks. But it does not return them.

Second: 
The asterisks themselfs are recognized completely wrong. Tesseract recognizes a 
symbol (cyan) INSIDE another symbol. As you see there is a cyan rectangle 
INSIDE another cyan rectangle.
One asterisk is recognized as if it would be two characters.

If I put the asterisks alone, they are recognized.
So this is not a problem of the traineddata.

Original issue reported on code.google.com by smaragds...@gmail.com on 7 Apr 2015 at 6:33

Attachments:

GoogleCodeExporter commented 9 years ago
Adition: 
If I reduce the amount of asterisks to two or three, they are recognized 
correctly together with "CAJERO" "

Original comment by smaragds...@gmail.com on 7 Apr 2015 at 6:37

GoogleCodeExporter commented 9 years ago
What version of tesseract are you using?

Original comment by zde...@gmail.com on 12 Apr 2015 at 4:07