Training text errors out

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

Please provide any additional information below.

Original issue reported on code.google.com by Vehix...@gmail.com on 12 Aug 2011 at 8:27

GoogleCodeExporter commented 9 years ago

Seemed to have hit enter before I wrote anything, here's the issue:
Training a font for numbers. The resulting box file is throwing errors after 
modifying it.

What steps will reproduce the problem?
1. Modifying the box file after running the tesseract ... ... ... box.train
2.
3.

What is the expected output? What do you see instead?
Expected a box file with no error outputs

What version of the product are you using? On what operating system?
3.0 on Windows 7

Please provide any additional information below.
The idea is to train tesseract to recognize numbers in various images, and to 
output them into a string. When training a new font of phone numbers, I get an 
error output after I modify the box file. It's worked for two other fonts, so 
I'm confused why it would stop. Attached are the tif file, the original box 
file, and the modified one.

Original comment by Vehix...@gmail.com on 12 Aug 2011 at 8:34

Attachments:

GoogleCodeExporter commented 9 years ago

1. it work for me with tesseract 3.01:
    tesseract eng.Lasha.samp.tif eng.Lasha.samp box.train (or)
    tesseract eng.Lasha.samp.tif eng.Lasha.samptemp box.train

give this output:
Tesseract Open Source OCR Engine v3.01 with Leptonica
Page 0
APPLY_BOXES:
   Boxes read from boxfile:      11
   Boxes failed resegmentation:       0
   Found 11 good blobs and 0 unlabelled blobs in 0 words.
   0 remaining unlabelled words deleted.
TRAINING ... Font name = Lasha
Generated training data for 4 words

=> no error

2. You are not following instruction: 
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training
_Images

Original comment by zde...@gmail.com on 19 Apr 2012 at 7:12

Changed state: WorksForMe

jacklicn / tesseract-ocr

Training text errors out #532