Closed GoogleCodeExporter closed 9 years ago
The command used to run was
tesseract eng.counter.exp0.png end.counter.exp0 nobatch box.train.stderr
Original comment by sorin.sb...@gmail.com
on 24 Nov 2011 at 4:24
Attachments:
That was the old 2.0 format. The 3.0 version has zero-based page number on each
line. You can try by adding another column containing 0 to your box file.
Original comment by nguyen...@gmail.com
on 24 Nov 2011 at 6:17
Is there at least one Windows GUI utility that is able to load/save the new
.box file format? I think I may already tried all of them and none seems to
work.
Original comment by sorin.sb...@gmail.com
on 24 Nov 2011 at 6:22
I added a new colums of zeros and I do not see the error anymore. Still I
consider a bug the fast that tesseract does not detect an old configuration
file format.
Even if it is not able to load the old format, it should at least complain
about it, instead of failing after "successfully" parsing ~10 lines.
Original comment by sorin.sb...@gmail.com
on 24 Nov 2011 at 6:44
Lets summarize it:
1. you did not follow training instruction (to create box file with tesseract)
2. you (or application of your choice) made wrong box file (not suitable or not
according description on [1]. You blame tesseract why it not recognize it ;-)
3. Training process is improving with each version. It does not make sense to
keep compatibility or testing for all possible mistakes... And yes, there will
change in training also in the next version.
4. Tesseract error messages could be sometimes unclear.
Just follow instruction and (hopefully) you will not face error.
[1]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Make_Box_Files
Original comment by zde...@gmail.com
on 24 Nov 2011 at 8:07
jTessBoxEditor on http://code.google.com/p/tesseract-ocr/wiki/AddOns supports
both 2.0x and 3.0x box formats.
Original comment by nguyen...@gmail.com
on 24 Nov 2011 at 8:09
Original issue reported on code.google.com by
sorin.sb...@gmail.com
on 24 Nov 2011 at 4:13