mithilesh1125 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

shapeclustering.exe -Assert failed #625

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

M:\rao- files\chilume\test-3.02>shapeclustering.exe -S shapetable -F 
font_properties -U unicharset  -O lang1.unicharset eng.arial.eurotext.tr
Reading eng.arial.eurotext.tr ...
Font id = -1/0, class id = 1/66 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file
..\classify\trainingsampleset.cpp, line 622

2.
3.

What is the expected output? What do you see instead?
Not aware of expected output but I am seeing the error message "Font id = -1/0, 
class id = 1/66 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file
..\classify\trainingsampleset.cpp, line 622"

What version of the product are you using? On what operating system?
tesseract-ocr 3.02-Alpha OS=WinXp(sp3)

Please provide any additional information below.
Also attached extract of CMD in which all commandlines used with its ouput 
indicated.Where I made mistake? 

Original issue reported on code.google.com by withbles...@gmail.com on 18 Feb 2012 at 3:26

Attachments:

GoogleCodeExporter commented 9 years ago
  It is presumed that shapeclustering.exe will generate shapetable? commandline was followed as per help displayed by "shapeclustering.exe"

Original comment by withbles...@gmail.com on 18 Feb 2012 at 3:36

GoogleCodeExporter commented 9 years ago
problem solved vide isssue no:626 treated closed.Now tesseract3.02 works fine 
for me.

Original comment by withbles...@gmail.com on 18 Feb 2012 at 2:13

GoogleCodeExporter commented 9 years ago

Original comment by zde...@gmail.com on 18 Feb 2012 at 3:56

GoogleCodeExporter commented 9 years ago
Issue 626 has been merged into this issue.

Original comment by zde...@gmail.com on 18 Feb 2012 at 8:24

GoogleCodeExporter commented 9 years ago
OS: Windows 7 64bit
Tesseract Version: 3.02

When I try to perform shapeclustring on my .tr file, I get the follwoing error:

PS C:\Program Files (x86)\Tesseract-OCR> shapeclustering -F font_properties -U 
unicharset eng.xfinityLt72.exp0.box.tr
Reading eng.xfinityLt72.exp0.box.tr ...
Font id = -1/0, class id = 1/26 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file 
..\..\classify\trainingsampleset.cpp, li
ne 622

I have attached all of the files that I am using/

Original comment by akshah...@gmail.com on 12 Feb 2013 at 3:43

Attachments:

GoogleCodeExporter commented 9 years ago
Using Win 7 64bit and v3.02

I have been able to create the box files, training files and unicharset file 
but cannot get the shapeclustering to work.

>>shapeclustering -F font_properties.txt -U unicharset eng.candp.img.tr
Reading eng.candp.img.tr ...
Font id = -1/0, class id = 1/105 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file 
..\..\classify\trainingsampleset.cpp, line 622

This was trained on a large TIFF file, and so the .tr file is just under 12MB, 
but I can provide all the data if required (cannot attach to this post however)

I have run through this process several times, altering file naming conventions 
but nothing seems to make the shapeclustering work.

Original comment by crazychrisnz on 21 Feb 2013 at 1:46

GoogleCodeExporter commented 9 years ago
@crazychrisnz:
Error "Font id = -1/0, class id = 1/105 on sample 0" means that font name is 
not in font_properties or font_properties does not meet 
[http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Requirements_for
_text_input_files requirements]

Also your filename is wrong (=not according wiki)

Original comment by zde...@gmail.com on 21 Feb 2013 at 8:01

GoogleCodeExporter commented 9 years ago
"Text input files need to meet this criteria: UTF-8 encoding without BOM" - The 
box and .tr files were just UTF8 (with BOM?) so resaved as UTF8 Without BOM and 
training worked a treat!

Thanks

Original comment by crazychrisnz on 21 Feb 2013 at 7:44

GoogleCodeExporter commented 9 years ago
I'm getting this error too, although notepad++ is telling me that all my files 
are already UTF8 without BOM! My error is:

Reading eng.lettergothic.exp0.box.tr ...
Font id = -1/0, class id = 53/55 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file
..\..\classify\trainingsampleset.cpp, line 622

Also I think the wiki should be corrected to:
shapeclustering -F font_properties -U unicharset eng.timesitalic.exp0.box.tr

instead of:
shapeclustering -F font_properties -U unicharset eng.timesitalic.exp0.tr

Original comment by marki...@gmail.com on 14 Jun 2013 at 9:00

GoogleCodeExporter commented 9 years ago
Does the shapeclustering program pull the font name from the file name of the 
.tr file? If that is the case, I think my font_properties is fine - utf8, no 
BOM, ending in \n.

Here it is:
lettergothic 1 0 0 1 0

Original comment by marki...@gmail.com on 14 Jun 2013 at 9:05

GoogleCodeExporter commented 9 years ago
@markisus: if your tr filename is eng.timesitalic.exp0.box.tr, than you are not 
following wiki.

Original comment by zde...@gmail.com on 15 Jun 2013 at 7:00

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Check your *.tr file. In my case it contains font names like "fontname.exp0 9 
448 3231 467 3263 0"
Remove ".exp0" postfix from each line in the file.

Also use "fontname 1 0 0 1 0" format of font_properties file instead of 
"lang.fontname.box 1 0 0 1 0"

Original comment by skat....@gmail.com on 18 Dec 2013 at 1:54