issue training tesseract 3.0

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1.running tesseract lang.font.number.tiff lang.font.number nobatch box.train

What is the expected output? What do you see instead?

I've got the following error:

mf.cpp:78: failed assertion `!isnan(Feature->Params[i])'
Abort trap

What version of the product are you using? On what operating system?

tesseract 3.00 on mac os x 10.4

Please provide any additional information below.

I checked the resulting .tr file, and it seems that it recognizes the 
characters unmodified, but it doesn't recognizes the boxes I have changed 
manually in the .box file. I'm training tesseract for a special set of 
character which is very large, could it be there the problem?

Here the log:

Tesseract Open Source OCR Engine with Leptonica
APPLY_BOXES: boxfile 4/1/𓀁 ((578,9944),(619,9987)): FAILURE! box overlaps 
blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:1 "𓇋"
APPLY_BOXES: boxfile 58/1/𓅱 ((526,8458),(567,8501)): FAILURE! box overlaps 
blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:22 "𓏏𓏭"
APPLY_BOXES: boxfile 108/1/𓐍𓂋 ((27,6916),(80,6959)): FAILURE! box 
overlaps no blobs or blobs in multiple rows
APPLY_BOXES: boxfile 108/2/𓐛 ((92,6927),(140,6944)): WARNING! false row break
APPLY_BOXES: boxfile 108/3/𓆱𓐍𓏏 ((153,6917),(209,6964)): FAILURE! box 
overlaps no blobs or blobs in multiple rows
APPLY_BOXES: boxfile 108/4/𓎼𓂋 ((223,6914),(274,6958)): FAILURE! box 
overlaps no blobs or blobs in multiple rows
APPLY_BOXES: boxfile 130/3/𓇾𓇾𓇾 ((1510,6229),(1552,6265)): FAILURE! box 
overlaps no blobs or blobs in multiple rows
APPLY_BOXES: boxfile 141/1/𓏤 ((1136,5919),(1141,5931)): FAILURE! box 
overlaps blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:60 "𓂻𓏪"
APPLY_BOXES: boxfile 155/1/𓏤 ((1120,5578),(1126,5589)): FAILURE! box 
overlaps blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:65 "𓂡𓏪"
APPLY_BOXES: boxfile 219/1/𓃀 ((2393,3687),(2415,3728)): FAILURE! box 
overlaps blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:93 "𓏌𓏏"
APPLY_BOXES: boxfile 227/10/𓏌𓂝𓁽 ((774,3487),(839,3545)): FAILURE! box 
overlaps no blobs or blobs in multiple rows
APPLY_BOXES: boxfile 227/12/𓉞𓉞𓉞 ((898,3494),(954,3538)): FAILURE! box 
overlaps no blobs or blobs in multiple rows
APPLY_BOXES: boxfile 236/13/𓍋 ((913,3081),(932,3124)): WARNING! false row 
break
APPLY_BOXES: boxfile 244/1/𓆷 ((47,2693),(101,2735)): FAILURE! box overlaps 
blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:6 "𓅓"
APPLY_BOXES: boxfile 254/1/𓅱 ((42,2420),(71,2463)): FAILURE! box overlaps 
blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:10 "𓏂"
APPLY_BOXES: boxfile 258/1/𓅓 ((60,2348),(103,2394)): FAILURE! box overlaps 
blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:11 "𓄑𓏏𓊵"
APPLY_BOXES: boxfile 265/1/𓏤 ((1276,2163),(1282,2176)): FAILURE! box 
overlaps blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:14 "𓏏𓏤𓏤"
APPLY_BOXES: boxfile 291/5/𓉞 ((2174,1513),(2193,1552)): FAILURE! box 
overlaps no blobs or blobs in multiple rows
APPLY_BOXES: boxfile 334/1/𓂝𓂬 ((54,376),(96,421)): FAILURE! box overlaps 
blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:40 "𓎡𓆑"
APPLY_BOXES: boxfile 343/1/𓐍 ((395,267),(416,286)): FAILURE! box overlaps 
blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:5 row:42 "𓅜"
APPLY_BOXES: More than one block??
APPLY_BOXES: Unlabelled word blk:1 row:1 allrows:1
APPLY_BOXES: Unlabelled word blk:2 row:4 allrows:5
APPLY_BOXES: Unlabelled word blk:2 row:6 allrows:7
APPLY_BOXES: Unlabelled word blk:2 row:6 allrows:7
APPLY_BOXES: Unlabelled word blk:2 row:6 allrows:7
APPLY_BOXES: Unlabelled word blk:2 row:6 allrows:7
APPLY_BOXES: Unlabelled word blk:2 row:7 allrows:8
APPLY_BOXES: Unlabelled word blk:2 row:7 allrows:8
APPLY_BOXES: Unlabelled word blk:2 row:14 allrows:15
APPLY_BOXES: Unlabelled word blk:2 row:14 allrows:15
APPLY_BOXES: Unlabelled word blk:2 row:14 allrows:15
APPLY_BOXES: Unlabelled word blk:2 row:15 allrows:16
APPLY_BOXES: Unlabelled word blk:2 row:15 allrows:16
APPLY_BOXES: Unlabelled word blk:2 row:16 allrows:17
APPLY_BOXES: Unlabelled word blk:2 row:17 allrows:18
APPLY_BOXES: Unlabelled word blk:2 row:18 allrows:19
APPLY_BOXES: Unlabelled word blk:2 row:18 allrows:19
APPLY_BOXES: Unlabelled word blk:2 row:20 allrows:21
APPLY_BOXES: Unlabelled word blk:2 row:20 allrows:21
APPLY_BOXES: Unlabelled word blk:2 row:20 allrows:21
APPLY_BOXES: Unlabelled word blk:2 row:20 allrows:21
APPLY_BOXES: Unlabelled word blk:2 row:20 allrows:21
APPLY_BOXES: Unlabelled word blk:2 row:20 allrows:21
APPLY_BOXES: Unlabelled word blk:2 row:21 allrows:22
APPLY_BOXES: Unlabelled word blk:2 row:22 allrows:23
APPLY_BOXES: Unlabelled word blk:2 row:22 allrows:23
APPLY_BOXES: Unlabelled word blk:2 row:22 allrows:23
APPLY_BOXES: Unlabelled word blk:2 row:23 allrows:24
APPLY_BOXES: Unlabelled word blk:2 row:25 allrows:26
APPLY_BOXES: Unlabelled word blk:2 row:26 allrows:27
APPLY_BOXES: Unlabelled word blk:2 row:29 allrows:30
APPLY_BOXES: Unlabelled word blk:2 row:30 allrows:31
APPLY_BOXES: Unlabelled word blk:2 row:30 allrows:31
APPLY_BOXES: Unlabelled word blk:2 row:31 allrows:32
APPLY_BOXES: Unlabelled word blk:2 row:31 allrows:32
APPLY_BOXES: Unlabelled word blk:2 row:32 allrows:33
APPLY_BOXES: Unlabelled word blk:2 row:33 allrows:34
APPLY_BOXES: Unlabelled word blk:2 row:35 allrows:36
APPLY_BOXES: Unlabelled word blk:2 row:35 allrows:36
APPLY_BOXES: Unlabelled word blk:2 row:38 allrows:39
APPLY_BOXES: Unlabelled word blk:2 row:39 allrows:40
APPLY_BOXES: Unlabelled word blk:2 row:40 allrows:41
APPLY_BOXES: Unlabelled word blk:2 row:40 allrows:41
APPLY_BOXES: Unlabelled word blk:2 row:42 allrows:43
APPLY_BOXES: Unlabelled word blk:2 row:43 allrows:44
APPLY_BOXES: Unlabelled word blk:2 row:44 allrows:45
APPLY_BOXES: Unlabelled word blk:2 row:44 allrows:45
APPLY_BOXES: Unlabelled word blk:2 row:44 allrows:45
APPLY_BOXES: Unlabelled word blk:2 row:45 allrows:46
APPLY_BOXES: Unlabelled word blk:2 row:45 allrows:46
APPLY_BOXES: Unlabelled word blk:2 row:46 allrows:47
APPLY_BOXES: Unlabelled word blk:2 row:49 allrows:50
APPLY_BOXES: Unlabelled word blk:2 row:51 allrows:52
APPLY_BOXES: Unlabelled word blk:2 row:52 allrows:53
APPLY_BOXES: Unlabelled word blk:2 row:53 allrows:54
APPLY_BOXES: Unlabelled word blk:2 row:54 allrows:55
APPLY_BOXES: Unlabelled word blk:2 row:55 allrows:56
APPLY_BOXES: Unlabelled word blk:2 row:57 allrows:58
APPLY_BOXES: Unlabelled word blk:2 row:59 allrows:60
APPLY_BOXES: Unlabelled word blk:2 row:59 allrows:60
APPLY_BOXES: Unlabelled word blk:2 row:60 allrows:61
APPLY_BOXES: Unlabelled word blk:2 row:61 allrows:62
APPLY_BOXES: Unlabelled word blk:2 row:62 allrows:63
APPLY_BOXES: Unlabelled word blk:2 row:63 allrows:64
APPLY_BOXES: Unlabelled word blk:2 row:64 allrows:65
APPLY_BOXES: Unlabelled word blk:2 row:65 allrows:66
APPLY_BOXES: Unlabelled word blk:2 row:65 allrows:66
APPLY_BOXES: Unlabelled word blk:2 row:67 allrows:68
APPLY_BOXES: Unlabelled word blk:2 row:69 allrows:70
APPLY_BOXES: Unlabelled word blk:2 row:69 allrows:70
APPLY_BOXES: Unlabelled word blk:2 row:70 allrows:71
APPLY_BOXES: Unlabelled word blk:2 row:70 allrows:71
APPLY_BOXES: Unlabelled word blk:2 row:70 allrows:71
APPLY_BOXES: Unlabelled word blk:2 row:70 allrows:71
APPLY_BOXES: Unlabelled word blk:2 row:70 allrows:71
APPLY_BOXES: Unlabelled word blk:2 row:71 allrows:72
APPLY_BOXES: Unlabelled word blk:2 row:71 allrows:72
APPLY_BOXES: Unlabelled word blk:2 row:71 allrows:72
APPLY_BOXES: Unlabelled word blk:2 row:72 allrows:73
APPLY_BOXES: Unlabelled word blk:2 row:77 allrows:78
APPLY_BOXES: Unlabelled word blk:2 row:78 allrows:79
APPLY_BOXES: Unlabelled word blk:2 row:78 allrows:79
APPLY_BOXES: Unlabelled word blk:2 row:79 allrows:80
APPLY_BOXES: Unlabelled word blk:2 row:80 allrows:81
APPLY_BOXES: Unlabelled word blk:2 row:81 allrows:82
APPLY_BOXES: Unlabelled word blk:2 row:84 allrows:85
APPLY_BOXES: Unlabelled word blk:2 row:84 allrows:85
APPLY_BOXES: Unlabelled word blk:2 row:85 allrows:86
APPLY_BOXES: Unlabelled word blk:2 row:85 allrows:86
APPLY_BOXES: Unlabelled word blk:2 row:86 allrows:87
APPLY_BOXES: Unlabelled word blk:2 row:86 allrows:87
APPLY_BOXES: Unlabelled word blk:2 row:87 allrows:88
APPLY_BOXES: Unlabelled word blk:2 row:87 allrows:88
APPLY_BOXES: Unlabelled word blk:2 row:92 allrows:93
APPLY_BOXES: Unlabelled word blk:2 row:92 allrows:93
APPLY_BOXES: Unlabelled word blk:2 row:93 allrows:94
APPLY_BOXES: Unlabelled word blk:2 row:93 allrows:94
APPLY_BOXES: Unlabelled word blk:2 row:93 allrows:94
APPLY_BOXES: Unlabelled word blk:2 row:93 allrows:94
APPLY_BOXES: Unlabelled word blk:2 row:94 allrows:95
APPLY_BOXES: Unlabelled word blk:2 row:94 allrows:95
APPLY_BOXES: Unlabelled word blk:2 row:94 allrows:95
APPLY_BOXES: Unlabelled word blk:2 row:95 allrows:96
APPLY_BOXES: Unlabelled word blk:2 row:97 allrows:98
APPLY_BOXES: Unlabelled word blk:2 row:99 allrows:100
APPLY_BOXES: Unlabelled word blk:2 row:99 allrows:100
APPLY_BOXES: Unlabelled word blk:2 row:99 allrows:100
APPLY_BOXES: Unlabelled word blk:2 row:100 allrows:101
APPLY_BOXES: Unlabelled word blk:2 row:101 allrows:102
APPLY_BOXES: Unlabelled word blk:2 row:101 allrows:102
APPLY_BOXES: Unlabelled word blk:3 row:1 allrows:103
APPLY_BOXES: Unlabelled word blk:4 row:1 allrows:104
APPLY_BOXES: Unlabelled word blk:5 row:1 allrows:105
APPLY_BOXES: Unlabelled word blk:5 row:1 allrows:105
APPLY_BOXES: Unlabelled word blk:5 row:2 allrows:106
APPLY_BOXES: Unlabelled word blk:5 row:3 allrows:107
APPLY_BOXES: Unlabelled word blk:5 row:3 allrows:107
APPLY_BOXES: Unlabelled word blk:5 row:3 allrows:107
APPLY_BOXES: Unlabelled word blk:5 row:3 allrows:107
APPLY_BOXES: Unlabelled word blk:5 row:5 allrows:109
APPLY_BOXES: Unlabelled word blk:5 row:5 allrows:109
APPLY_BOXES: Unlabelled word blk:5 row:6 allrows:110
APPLY_BOXES: Unlabelled word blk:5 row:6 allrows:110
APPLY_BOXES: Unlabelled word blk:5 row:7 allrows:111
APPLY_BOXES: Unlabelled word blk:5 row:8 allrows:112
APPLY_BOXES: Unlabelled word blk:5 row:8 allrows:112
APPLY_BOXES: Unlabelled word blk:5 row:8 allrows:112
APPLY_BOXES: Unlabelled word blk:5 row:8 allrows:112
APPLY_BOXES: Unlabelled word blk:5 row:8 allrows:112
APPLY_BOXES: Unlabelled word blk:5 row:8 allrows:112
APPLY_BOXES: Unlabelled word blk:5 row:8 allrows:112
APPLY_BOXES: Unlabelled word blk:5 row:9 allrows:113
APPLY_BOXES: Unlabelled word blk:5 row:9 allrows:113
APPLY_BOXES: Unlabelled word blk:5 row:10 allrows:114
APPLY_BOXES: Unlabelled word blk:5 row:10 allrows:114
APPLY_BOXES: Unlabelled word blk:5 row:10 allrows:114
APPLY_BOXES: Unlabelled word blk:5 row:10 allrows:114
APPLY_BOXES: Unlabelled word blk:5 row:11 allrows:115
APPLY_BOXES: Unlabelled word blk:5 row:13 allrows:117
APPLY_BOXES: Unlabelled word blk:5 row:13 allrows:117
APPLY_BOXES: Unlabelled word blk:5 row:13 allrows:117
APPLY_BOXES: Unlabelled word blk:5 row:13 allrows:117
APPLY_BOXES: Unlabelled word blk:5 row:13 allrows:117
APPLY_BOXES: Unlabelled word blk:5 row:13 allrows:117
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:14 allrows:118
APPLY_BOXES: Unlabelled word blk:5 row:15 allrows:119
APPLY_BOXES: Unlabelled word blk:5 row:15 allrows:119
APPLY_BOXES: Unlabelled word blk:5 row:15 allrows:119
APPLY_BOXES: Unlabelled word blk:5 row:15 allrows:119
APPLY_BOXES: Unlabelled word blk:5 row:15 allrows:119
APPLY_BOXES: Unlabelled word blk:5 row:15 allrows:119
APPLY_BOXES: Unlabelled word blk:5 row:15 allrows:119
APPLY_BOXES: Unlabelled word blk:5 row:15 allrows:119
APPLY_BOXES: Unlabelled word blk:5 row:15 allrows:119
APPLY_BOXES: Unlabelled word blk:5 row:16 allrows:120
APPLY_BOXES: Unlabelled word blk:5 row:16 allrows:120
APPLY_BOXES: Unlabelled word blk:5 row:17 allrows:121
APPLY_BOXES: Unlabelled word blk:5 row:17 allrows:121
APPLY_BOXES: Unlabelled word blk:5 row:17 allrows:121
APPLY_BOXES: Unlabelled word blk:5 row:18 allrows:122
APPLY_BOXES: Unlabelled word blk:5 row:18 allrows:122
APPLY_BOXES: Unlabelled word blk:5 row:18 allrows:122
APPLY_BOXES: Unlabelled word blk:5 row:19 allrows:123
APPLY_BOXES: Unlabelled word blk:5 row:19 allrows:123
APPLY_BOXES: Unlabelled word blk:5 row:19 allrows:123
APPLY_BOXES: Unlabelled word blk:5 row:19 allrows:123
APPLY_BOXES: Unlabelled word blk:5 row:19 allrows:123
APPLY_BOXES: Unlabelled word blk:5 row:19 allrows:123
APPLY_BOXES: Unlabelled word blk:5 row:19 allrows:123
APPLY_BOXES: Unlabelled word blk:5 row:20 allrows:124
APPLY_BOXES: Unlabelled word blk:5 row:20 allrows:124
APPLY_BOXES: Unlabelled word blk:5 row:20 allrows:124
APPLY_BOXES: Unlabelled word blk:5 row:20 allrows:124
APPLY_BOXES: Unlabelled word blk:5 row:20 allrows:124
APPLY_BOXES: Unlabelled word blk:5 row:20 allrows:124
APPLY_BOXES: Unlabelled word blk:5 row:21 allrows:125
APPLY_BOXES: Unlabelled word blk:5 row:21 allrows:125
APPLY_BOXES: Unlabelled word blk:5 row:21 allrows:125
APPLY_BOXES: Unlabelled word blk:5 row:22 allrows:126
APPLY_BOXES: Unlabelled word blk:5 row:22 allrows:126
APPLY_BOXES: Unlabelled word blk:5 row:22 allrows:126
APPLY_BOXES: Unlabelled word blk:5 row:22 allrows:126
APPLY_BOXES: Unlabelled word blk:5 row:23 allrows:127
APPLY_BOXES: Unlabelled word blk:5 row:23 allrows:127
APPLY_BOXES: Unlabelled word blk:5 row:23 allrows:127
APPLY_BOXES: Unlabelled word blk:5 row:24 allrows:128
APPLY_BOXES: Unlabelled word blk:5 row:24 allrows:128
APPLY_BOXES: Unlabelled word blk:5 row:25 allrows:129
APPLY_BOXES: Unlabelled word blk:5 row:25 allrows:129
APPLY_BOXES: Unlabelled word blk:5 row:25 allrows:129
APPLY_BOXES: Unlabelled word blk:5 row:26 allrows:130
APPLY_BOXES: Unlabelled word blk:5 row:26 allrows:130
APPLY_BOXES: Unlabelled word blk:5 row:26 allrows:130
APPLY_BOXES: Unlabelled word blk:5 row:26 allrows:130
APPLY_BOXES: Unlabelled word blk:5 row:27 allrows:131
APPLY_BOXES: Unlabelled word blk:5 row:27 allrows:131
APPLY_BOXES: Unlabelled word blk:5 row:29 allrows:133
APPLY_BOXES: Unlabelled word blk:5 row:29 allrows:133
APPLY_BOXES: Unlabelled word blk:5 row:30 allrows:134
APPLY_BOXES: Unlabelled word blk:5 row:30 allrows:134
APPLY_BOXES: Unlabelled word blk:5 row:31 allrows:135
APPLY_BOXES: Unlabelled word blk:5 row:31 allrows:135
APPLY_BOXES: Unlabelled word blk:5 row:31 allrows:135
APPLY_BOXES: Unlabelled word blk:5 row:32 allrows:136
APPLY_BOXES: Unlabelled word blk:5 row:33 allrows:137
APPLY_BOXES: Unlabelled word blk:5 row:33 allrows:137
APPLY_BOXES: Unlabelled word blk:5 row:33 allrows:137
APPLY_BOXES: Unlabelled word blk:5 row:33 allrows:137
APPLY_BOXES: Unlabelled word blk:5 row:33 allrows:137
APPLY_BOXES: Unlabelled word blk:5 row:34 allrows:138
APPLY_BOXES: Unlabelled word blk:5 row:34 allrows:138
APPLY_BOXES: Unlabelled word blk:5 row:34 allrows:138
APPLY_BOXES: Unlabelled word blk:5 row:35 allrows:139
APPLY_BOXES: Unlabelled word blk:5 row:35 allrows:139
APPLY_BOXES: Unlabelled word blk:5 row:36 allrows:140
APPLY_BOXES: Unlabelled word blk:5 row:36 allrows:140
APPLY_BOXES: Unlabelled word blk:5 row:37 allrows:141
APPLY_BOXES: Unlabelled word blk:5 row:38 allrows:142
APPLY_BOXES: Unlabelled word blk:5 row:39 allrows:143
APPLY_BOXES: Unlabelled word blk:5 row:40 allrows:144
APPLY_BOXES: Unlabelled word blk:5 row:42 allrows:146
APPLY_BOXES: Unlabelled word blk:5 row:42 allrows:146
APPLY_BOXES: Unlabelled word blk:5 row:42 allrows:146
APPLY_BOXES: Unlabelled word blk:5 row:42 allrows:146
APPLY_BOXES: Unlabelled word blk:5 row:42 allrows:146
APPLY_BOXES: Unlabelled word blk:5 row:42 allrows:146
APPLY_BOXES: Unlabelled word blk:5 row:44 allrows:148
APPLY_BOXES: Unlabelled word blk:5 row:45 allrows:149
APPLY_BOXES: Unlabelled word blk:5 row:45 allrows:149
APPLY_BOXES: REBALANCE REQD "𓇋 [131cb ]" - target of 261 from 260 labelled 
samples
APPLY_BOXES: REBALANCE REQD "𓀁 [13001 ]" - target of 27 from 26 labelled 
samples
APPLY_BOXES: REBALANCE REQD "𓃀 [130c0 ]" - target of 41 from 40 labelled 
samples
APPLY_BOXES: REBALANCE REQD "𓅱 [13171 ]" - target of 214 from 212 labelled 
samples
APPLY_BOXES: REBALANCE REQD "𓆷 [131b7 ]" - target of 9 from 8 labelled 
samples
APPLY_BOXES: REBALANCE REQD "𓏏𓏤𓏤 [133cf 133e4 133e4 ]" - target of 8 
from 7 labelled samples
APPLY_BOXES: REBALANCE REQD "𓏤 [133e4 ]" - target of 102 from 99 labelled 
samples
APPLY_BOXES: REBALANCE REQD "𓅓 [13153 ]" - target of 161 from 159 labelled 
samples
APPLY_BOXES: REBALANCE REQD "𓇾𓇾𓇾 [131fe 131fe 131fe ]" - target of 5 
from 4 labelled samples
APPLY_BOXES: REBALANCE REQD "𓐍𓂋 [1340d 1308b ]" - target of 20 from 19 
labelled samples
APPLY_BOXES: REBALANCE REQD "𓆱𓐍𓏏 [131b1 1340d 133cf ]" - target of 10 
from 9 labelled samples
APPLY_BOXES: REBALANCE REQD "𓏏𓏭 [133cf 133ed ]" - target of 10 from 9 
labelled samples
APPLY_BOXES: REBALANCE REQD "𓏌𓏏 [133cc 133cf ]" - target of 11 from 10 
labelled samples
APPLY_BOXES: REBALANCE REQD "𓂡𓏪 [130a1 133ea ]" - target of 6 from 5 
labelled samples
APPLY_BOXES: REBALANCE REQD "𓐍 [1340d ]" - target of 3 from 2 labelled 
samples
APPLY_BOXES: REBALANCE REQD "𓎼𓂋 [133bc 1308b ]" - target of 4 from 3 
labelled samples
APPLY_BOXES: REBALANCE REQD "𓉞 [1325e ]" - target of 3 from 2 labelled 
samples
APPLY_BOXES: REBALANCE REQD "𓂝𓂬 [1309d 130ac ]" - target of 6 from 5 
labelled samples
APPLY_BOXES: REBALANCE REQD "𓂻𓏪 [130bb 133ea ]" - target of 3 from 2 
labelled samples
APPLY_BOXES: REBALANCE REQD "𓎡𓆑 [133a1 13191 ]" - target of 7 from 6 
labelled samples
APPLY_BOXES: REBALANCE REQD "𓅜 [1315c ]" - target of 3 from 2 labelled 
samples
APPLY_BOXES: FATALITY - 0 labelled samples of "𓏌𓂝𓁽 [133cc 1309d 1307d 
]" - target is 1:
APPLY_BOXES: FATALITY - 0 labelled samples of "𓉞𓉞𓉞 [1325e 1325e 1325e 
]" - target is 1:
APPLY_BOXES: FATALITY - 0 labelled samples of "𓏂 [133c2 ]" - target is 1:
APPLY_BOXES: FATALITY - 0 labelled samples of "𓄑𓏏𓊵 [13111 133cf 132b5 
]" - target is 1:
APPLY_BOXES:
   Boxes read from boxfile:    4753
   Initially labelled blobs:   4724 in 149 rows
   Box failures detected:           29
   Duped blobs for rebalance:    25
   "𓏌𓂝𓁽" has fewest samples:     0
                Total unlabelled words:      241
                Final labelled words:       4749
Generating training data
TRAINING ... Font name = sethe

Original issue reported on code.google.com by Oduss...@gmail.com on 15 Jan 2011 at 4:04

GoogleCodeExporter commented 9 years ago

please post example files: tiff, unmodified box file, modified box file...

Original comment by zde...@gmail.com on 15 Jan 2011 at 7:33

GoogleCodeExporter commented 9 years ago

Hi!

I tried to post the files some days ago, but it seems it didn't work.. well, I 
try again now.
I attach here the image I used, the .box file modified and the .tr file

I have also tried to split the tiff image in a multi-page tiff, but it doesn't 
work either.

Original comment by Oduss...@gmail.com on 27 Jan 2011 at 2:34

Attachments:

GoogleCodeExporter commented 9 years ago

interesting task! what font you use for displaying hieroglyphs? what program 
you used for editing box file?

Original comment by zde...@gmail.com on 28 Jan 2011 at 9:06

GoogleCodeExporter commented 9 years ago

and also a little bit crazy, I think..

As for the font, I am using a font I have created. It is quite good, I tried to 
load it here, but it seems that google doesn't like it.
However, you can use this font:
http://www.alanwood.net/unicode/egyptian-hieroglyphs.html
http://www.alanwood.net/unicode/fonts-african.html#egyptianhieroglyphs

it uses the same unicode slots I use, so it is a valid alternative, at least 
for tests.

As for the program, it's a homemade solution, based on this program to write 
chinese and other non latin scripts:

http://openvanilla.org/index-en.php

I created an additional input pethod which allows me to display the signs 
wrinting their Gardiner codes (here the gardienr list:  
http://de.wikipedia.org/wiki/Gardiner-Liste )

so.. interesting task, or impossible task?

Could it be that the tiff is too big? maybe too many lines, or too many signs?

Original comment by Oduss...@gmail.com on 28 Jan 2011 at 4:33

GoogleCodeExporter commented 9 years ago

Ah I've forgot to say that I did it in two steps: first I created a .box 
covering more or less the 1/3 of the .tiff, and then I used the language 
trained data so obtained to create the box for the whole page.

The first .box worked without problem, the second one gives me this issue..

Original comment by Oduss...@gmail.com on 28 Jan 2011 at 4:37

GoogleCodeExporter commented 9 years ago

so any idea to solve thi problem?

Original comment by Oduss...@gmail.com on 15 Feb 2011 at 5:06

GoogleCodeExporter commented 9 years ago

Are you able to compile code from svn? Than you can try to run tesseract 3.02. 
It looks like there is some progress, because I was able (at least) to go 
through training:

$ tesseract hiero.sethe.exp1.tiff hiero.sethe.exp1 nobatch box.train
$ unicharset_extractor hiero.sethe.exp1.box
$ shapeclustering -F font_properties -U unicharset -O hiero.unicharset 
hiero.sethe.exp1.tr
$ mftraining -F font_properties -U unicharset -O hiero.unicharset 
hiero.sethe.exp1.tr
$ mv shapetable hiero.shapetable
$ mv inttemp hiero.inttemp
$ mv pffmtable hiero.pffmtable
$ mv normproto hiero.normproto
$ combine_tessdata hiero.
$ cp -f hiero.traineddata \to\your\tessdata_dir\
$ tesseract hiero.test.png hiero.test-ocr -l hiero

During ocr tesseract still produce errors (Error: unichar ÎŤ in normproto 
file is not in unichar set)...

Original comment by zde...@gmail.com on 21 Feb 2012 at 10:37

Attachments:

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Can you please send me a copy of your "font_properties" file and the names of 
your source image/tiff files ?

Thanks
Richard
rca08207 (A T) bigpond.net.au

Original comment by nine.ele...@gmail.com on 30 Jun 2012 at 8:05

GoogleCodeExporter commented 9 years ago

Issue 473 has been merged into this issue.

Original comment by zde...@gmail.com on 24 Jul 2012 at 7:52

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

I am closing this issue as it is related to old tool (3.00) version.
In current code there are new training tools (text2image) that support training 
from font.

Original comment by zde...@gmail.com on 1 May 2015 at 7:28

Changed state: WontFix

michaelethompson / tesseract-ocr

issue training tesseract 3.0 #430