patcharats / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Error: 31 classes in inttemp while unicharset contains 32 unichars. #54

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Using phototest.tif installed by tess2.0, re-generated 8 data files with
prefixed "eng.xxx"
2. Replaced above re-generated 8 data files with original data 8 files
installed by tess2.0
2. Executed "tesseract phototest.tif output -l eng"
3. Instead of output.txt, tesseract.log generated as "Error: 31 classes in
inttemp while unicharset contains 32 unichars."
4. ouput.txt failed/not generated.

What is the expected output? What do you see instead?
output of re-generated data 8 files (eng.xxx) should be identical with the
output of original data 8 files - instead  generated log error as above Why?

What version of the product are you using? On what operating system?
Tesseract2.0   MSwindows

Please provide any additional information below.
It appears there is some problem with soure code  -which have 
re-investigated.

Original issue reported on code.google.com by withbles...@gmail.com on 8 Aug 2007 at 4:45

GoogleCodeExporter commented 9 years ago
Please see the revised TrainingTesseract wiki.
You have to modify the box file to eliminate FATALITY errors when creating .tr 
files.
Check the tesseract.log from that run. It means that one of the characters in 
your
box file has no usable samples (perhaps due to incorrect coordinates) and 
therefore
the training process has failed.
It would be better if tesseract would abort and produce no .tr file than 
produce a
bad one.

Original comment by theraysm...@gmail.com on 17 Aug 2007 at 4:13

GoogleCodeExporter commented 9 years ago
Several bugs in training fixed in 2.01.

Original comment by theraysm...@gmail.com on 30 Aug 2007 at 7:56