kcobra / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

problem with box file #473

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.making a box file
2.
3.

What is the expected output? What do you see instead?

I get the box file, but but it is full of fatalities. I can create the 
traineddata file, but when i use it, it does not work

What version of the product are you using? On what operating system?

tesseract 3.00 on macOSX 10.4

Please provide any additional information below.

I am working with egyptian hieroglyphs, you can find a font to read them here:

http://www.alanwood.net/unicode/egyptian-hieroglyphs.html
http://www.alanwood.net/unicode/fonts-african.html#egyptianhieroglyphs

I don't understand what is going wrong: it seems to me the signs are quite 
clear. Do you have any idea to improve the tiff image?

I attach here the box, and the log. The image is too big, I am using a 
multipage (12 pages) tiff. I attach here only the first page, as an example.

Original issue reported on code.google.com by Oduss...@gmail.com on 6 Apr 2011 at 11:50

Attachments:

GoogleCodeExporter commented 9 years ago
I have not answers for all issues, but here are some findings:

First of all - input image: use at least 300 dpi resolution and low number of 
colors (I prefare 16 gray colors or just 2 color). You can get this way smaller 
file (see attachment hiero.egyptianhiero.exp2.png) 
Next: I removed other of pages (see hiero.egyptianhiero.exp2.box) than number 
of errors decreased ;-)

I tried to train it in tesseract 3.01 (it is in svn) and I got "better" log 
output - see tesseract.log. If you visualize it (see 
box-hiero.egyptianhiero.exp2.png: pink rectangles are boxes from box file, blue 
are errors "FAILURE! Couldn't find a matching blob" and green are "Unlabelled 
word at :Bounding box") than it looks like tesseract is not happy because of 
missing boxes (Unlabelled word at :Bounding box).

For "FAILURE! Couldn't find a matching blob" or "FAILURE! box overlaps no blobs 
or blobs in multiple rows" I have no suggestion for the moment...

Original comment by zde...@gmail.com on 6 Apr 2011 at 2:47

Attachments:

GoogleCodeExporter commented 9 years ago
I am merging this issue to issue 430. 3.02 report real problem ("Unlabelled 
word") + 4 "FAILURE! Couldn't find a matching blob" (I am not sure about 
reason, but input image did not fulfill requirement from 
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training
_Images)

Original comment by zde...@gmail.com on 24 Jul 2012 at 7:52