meego / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Why some chars are are ignored during recognizing #522

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. tesseract.exe b.bmp b
2.
3.

What is the expected output? What do you see instead?
expected: 11 05 11 MC I MG84OE 11202935B
instead:  11 05 11 MC I MG84OE 11202935

What version of the product are you using? On what operating system?
tesseract version 3.0.1  on windowx XP

Please provide any additional information below.
   using attached training data "eng.traineddata", the result will be same as above; using tesseract provided training data, some chars are recognized a little differently but same result is that the expected 'B' is ignored.

Maybe same issue above:
What steps will reproduce the problem?
   1. tesseract withDash.tif  withDash  batch.nochop makebox
2.
3.

What is the expected output? What do you see instead?
     expected: every char in image should has a box data generated
     instead:  some chars such as: B, 8 do not have box data generated

Please provide any additional information below.
    using attached training data "eng.traineddata" as training lib data.
    the attached tool jTessBoxEditor.jar can be used to view the view the generated box file "withDash.box".

Original issue reported on code.google.com by iqy...@163.com on 28 Jul 2011 at 11:25

Attachments: