Open GoogleCodeExporter opened 9 years ago
Here another example for a totally wrong word separation:
Original comment by smaragds...@gmail.com
on 23 Sep 2014 at 7:01
Attachments:
.... and then a slightly modified preprocessing of the image and the same text
is suddenly recognized correctly. If you compare the two images you see that
there are some additional random pixels. This is the only difference between
them.
This confirms what i wrote above:
It seems that the criteria used by Tesseract are very weak and depend on many
random factors.
There is definitely no space between "J" and "eff" and "rey".
So why does Tesseract split the text that is clearly one single word?
Original comment by smaragds...@gmail.com
on 23 Sep 2014 at 3:21
Attachments:
Here another weird example.
One of the images has the text a little bit bolder than the other one. But the
result of the bold text is that Tesseract does not recognize ANY character.
I would understand if Tesseract would recognize the "g" wrongly like an "8".
But why does it not recognize ANY of all the other characters?
Original comment by smaragds...@gmail.com
on 23 Sep 2014 at 9:13
Attachments:
[deleted comment]
[deleted comment]
[deleted comment]
And here is another related bug.
The image below has been analyzed in PSM_AUTO mode.
The original image has two horizontal rulers which are detected correctly.
But the further processing ignores them totally.
Between "Coffee" and "Subtotal" is a ruler and a new paragraph should start.
The same applies between "Tax" and "Total".
But the first horizontal ruler is ignored.
Instead a new paragraph starts after the first text line between "Chicken" and
"Chips". This is totally wrong. The first 4 lines have exactly the same
distance. They should be in the same paragraph.
See image: (blue = text block, green = paragraph, red = text line, yellow =
word)
Original comment by smaragds...@gmail.com
on 25 Sep 2014 at 7:05
Attachments:
This bug is in state "New" since 6 months.
I have the impression that posting bug reports here is completely in vain.
Original comment by smaragds...@gmail.com
on 21 Mar 2015 at 12:53
Original issue reported on code.google.com by
smaragds...@gmail.com
on 21 Sep 2014 at 5:04Attachments: