Single word recognition FAIL

jacklicn / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr

Other

0 stars 0 forks source link

Single word recognition FAIL #519

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. a picture with one single word
2. tesseract ocr will parse nothing from that picture

Original issue reported on code.google.com by Christia...@googlemail.com on 15 Jul 2011 at 1:24

GoogleCodeExporter commented 9 years ago

please provide example.

Original comment by zde...@gmail.com on 15 Jul 2011 at 8:08

GoogleCodeExporter commented 9 years ago

I've attached the *.bmp file. Pictured is the German word "Ist". Tesseract OCR 
with language set to "deu" will not recognize the word.

Original comment by Christia...@googlemail.com on 19 Jul 2011 at 9:01

Attachments:

ocr_fail_deu.bmp

GoogleCodeExporter commented 9 years ago

I've experienced this too. In my case, I use a different algorithm to cut the 
words out of an image that's more complex than what Tesseract can handle, and 
it seems to help recognition if you have multiple words of roughly the same 
font on an image. Large fonts and small fonts on the same image also seems to 
lower recognition.

Original comment by joakim.a...@gmail.com on 14 Nov 2011 at 9:08

GoogleCodeExporter commented 9 years ago

tesseract 3.01 version handle it correctly:
tesseract ocr_fail_deu.bmp ocr_fail_deu.bmp -l deu -psm 8

Original comment by zde...@gmail.com on 14 Nov 2011 at 3:57

Changed state: WorksForMe