raffaeldantas / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
1 stars 0 forks source link

Error while running tesseract for a new traindata #1439

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.Collecting all the files for traindata 
2.Making traindata
3.Put the traindata in tesssdata folder and run tesseract.

What is the expected output? What do you see instead?
Expected ouput is a text file containing the images of the text.Instead, I see 
the error
Index>=0 &index<size_used:Error:Assert_Failed

Please use labels and text to provide additional information.
I gave the screenshot of the error and other files.My language name is ban and 
font name is sl.

Original issue reported on code.google.com by m.tawfi...@gmail.com on 29 Mar 2015 at 4:03

Attachments:

GoogleCodeExporter commented 8 years ago
I think there is a problem with your font_properties file. It seems to have a 
blank line above, while blank line should be at the end.

I was able to generate the traineddata with your files in jtessboxeditor (I 
needed to add the words list, frequent words list and rename the font 
properties file to the naming convention needed by the program.

BTW, there is already traineddata for Bangla - please see

https://code.google.com/p/tesseract-ocr/source/browse/ben.traineddata?repo=tessd
ata

and also see

https://code.google.com/p/tesseract-ocr/source/browse?repo=langdata#git%2Fben

Original comment by shreeshrii on 30 Mar 2015 at 8:50

GoogleCodeExporter commented 8 years ago
No, this will not work if I do not leave a blank space in front of the first 
line, however, I have the same tif file as input.By the way,

Original comment by m.tawfi...@gmail.com on 31 Mar 2015 at 2:27

GoogleCodeExporter commented 8 years ago
You did not follow instruction[1] e.g. font_properties.txt does not meet 
"Requirements for text input files", so I guess you did not created valid 
traineddata.

Anyway you issue is invalid, because for support you should use tesseract user 
forum. Issues tracker should be only for reporting of google produced 
traineddata files.

[1] https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

Original comment by zde...@gmail.com on 9 Apr 2015 at 8:06