jacklicn / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

APPLY_BOXES: FAILURE! Couldn't find a matching blob" #585

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I made a box file by using bbTesseract and for one of the characters I get the 
APPLY_BOXES: ... FAILURE! Couldn't find a matching blob"

The image is 26x241 and Tesseract is complaining only about the last line and 
it doesn't make any sense why it doesn't like these coordinates.

1 7 416 19 446
2 7 377 19 407
3 7 336 19 366
4 7 295 19 325
5 7 254 19 284
6 7 213 19 243
7 7 172 19 202
8 7 132 19 162
9 7 91 19 121
0 7 50 19 80
: 7 5 19 35

Original issue reported on code.google.com by sorin.sb...@gmail.com on 24 Nov 2011 at 4:13

GoogleCodeExporter commented 9 years ago
The command used to run was
tesseract eng.counter.exp0.png end.counter.exp0 nobatch box.train.stderr

Original comment by sorin.sb...@gmail.com on 24 Nov 2011 at 4:24

Attachments:

GoogleCodeExporter commented 9 years ago
That was the old 2.0 format. The 3.0 version has zero-based page number on each 
line. You can try by adding another column containing 0 to your box file.

Original comment by nguyen...@gmail.com on 24 Nov 2011 at 6:17

GoogleCodeExporter commented 9 years ago
Is there at least one Windows GUI utility that is able to load/save the new 
.box file format? I think I may already tried all of them and none seems to 
work.

Original comment by sorin.sb...@gmail.com on 24 Nov 2011 at 6:22

GoogleCodeExporter commented 9 years ago
I added a new colums of zeros and I do not see the error anymore. Still I 
consider a bug the fast that tesseract does not detect an old configuration 
file format.

Even if it is not able to load the old format, it should at least complain 
about it, instead of failing after "successfully" parsing ~10 lines.

Original comment by sorin.sb...@gmail.com on 24 Nov 2011 at 6:44

GoogleCodeExporter commented 9 years ago
Lets summarize it:
1. you did not follow training instruction (to create box file with tesseract)
2. you (or application of your choice) made wrong box file (not suitable or not 
according description on [1]. You blame tesseract why it not recognize it ;-)
3. Training process is improving with each version. It does not make sense to 
keep compatibility or testing for all possible mistakes... And yes, there will 
change in training also in the next version.
4. Tesseract error messages could be sometimes unclear.

Just follow instruction and (hopefully) you will not face error.

[1] 
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Make_Box_Files

Original comment by zde...@gmail.com on 24 Nov 2011 at 8:07

GoogleCodeExporter commented 9 years ago
jTessBoxEditor on http://code.google.com/p/tesseract-ocr/wiki/AddOns supports 
both 2.0x and 3.0x box formats.

Original comment by nguyen...@gmail.com on 24 Nov 2011 at 8:09