jacklicn / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Cut the output in other line #523

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.\tesseract exp12.tif out12 -l mag
2.
3.

What is the expected output? What do you see instead?
Expected output: 

Barrin's Spite
Barter in Blood
Baru, Fist of Krosa
Basal Sliver
.
.
.
.
Battlefield Forge

Seen output: 

Barrin's Spite
Barter in Blood
Baru, Fist of Kro
Basal Sliver
.
.
.
.
Battlefield Forge
sa

What version of the product are you using? On what operating system?
Tesseract 3.0 use in Windows Vista Home Premium X86

Please provide any additional information below.

I do not know if the warnings in training have something to do in the issue.

This is the output of training:

tesseract mag.nina.Ent10.tif mag.nina.Ent10 nobatch box.train.stderr

Tesseract Open Source OCR Engine with Leptonica
APPLY_BOXES: boxfile 431/2/B ((170,947),(177,957)): WARNING! false row break
APPLY_BOXES: boxfile 818/2/C ((302,946),(309,956)): WARNING! false row break
APPLY_BOXES: boxfile 1981/2/L ((561,575),(568,585)): WARNING! false row break
APPLY_BOXES: boxfile 2057/2/M ((742,580),(752,590)): WARNING! false row break
APPLY_BOXES: boxfile 2120/6/N ((922,580),(930,590)): WARNING! false row break
APPLY_BOXES: boxfile 2147/2/O ((1089,580),(1099,590)): WARNING! false row break
APPLY_BOXES: boxfile 2272/2/V ((1036,487),(1044,497)): WARNING! false row break
APPLY_BOXES: boxfile 2571/2/5 ((795,252),(802,262)): WARNING! false row break
APPLY_BOXES: boxfile 2609/6/U ((859,310),(868,320)): WARNING! false row break
APPLY_BOXES: boxfile 2629/2/V ((1035,312),(1044,322)): WARNING! false row break
APPLY_BOXES: More than one block??
APPLY_BOXES: Unlabelled word blk:5 row:8 allrows:57
APPLY_BOXES:
   Boxes read from boxfile:    4116
   Initially labelled blobs:   4116 in 229 rows
   Box failures detected:            0
   Duped blobs for rebalance:     0
   "AE" has fewest samples:     1
                Total unlabelled words:        1
                Final labelled words:       4116
Generating training data
TRAINING ... Font name = nina
Generated training data for 4116 blobs

Original issue reported on code.google.com by olijuana...@gmail.com on 28 Jul 2011 at 5:48

Attachments:

GoogleCodeExporter commented 9 years ago
Please try recent code (3.01 version solved a lot of APPLY_BOXES problems) and 
consider pre-processing images to meet suggested image/font size criteria (see 
FAQ e.g. 
http://code.google.com/p/tesseract-ocr/wiki/FAQ#Is_there_a_Minimum_Text_Size?_(I
t_won't_read_screen_text!))

Original comment by zde...@gmail.com on 17 Apr 2012 at 9:15