Closed GoogleCodeExporter closed 9 years ago
It is presumed that shapeclustering.exe will generate shapetable? commandline was followed as per help displayed by "shapeclustering.exe"
Original comment by withbles...@gmail.com
on 18 Feb 2012 at 3:36
problem solved vide isssue no:626 treated closed.Now tesseract3.02 works fine
for me.
Original comment by withbles...@gmail.com
on 18 Feb 2012 at 2:13
Original comment by zde...@gmail.com
on 18 Feb 2012 at 3:56
Issue 626 has been merged into this issue.
Original comment by zde...@gmail.com
on 18 Feb 2012 at 8:24
OS: Windows 7 64bit
Tesseract Version: 3.02
When I try to perform shapeclustring on my .tr file, I get the follwoing error:
PS C:\Program Files (x86)\Tesseract-OCR> shapeclustering -F font_properties -U
unicharset eng.xfinityLt72.exp0.box.tr
Reading eng.xfinityLt72.exp0.box.tr ...
Font id = -1/0, class id = 1/26 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file
..\..\classify\trainingsampleset.cpp, li
ne 622
I have attached all of the files that I am using/
Original comment by akshah...@gmail.com
on 12 Feb 2013 at 3:43
Attachments:
Using Win 7 64bit and v3.02
I have been able to create the box files, training files and unicharset file
but cannot get the shapeclustering to work.
>>shapeclustering -F font_properties.txt -U unicharset eng.candp.img.tr
Reading eng.candp.img.tr ...
Font id = -1/0, class id = 1/105 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file
..\..\classify\trainingsampleset.cpp, line 622
This was trained on a large TIFF file, and so the .tr file is just under 12MB,
but I can provide all the data if required (cannot attach to this post however)
I have run through this process several times, altering file naming conventions
but nothing seems to make the shapeclustering work.
Original comment by crazychrisnz
on 21 Feb 2013 at 1:46
@crazychrisnz:
Error "Font id = -1/0, class id = 1/105 on sample 0" means that font name is
not in font_properties or font_properties does not meet
[http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Requirements_for
_text_input_files requirements]
Also your filename is wrong (=not according wiki)
Original comment by zde...@gmail.com
on 21 Feb 2013 at 8:01
"Text input files need to meet this criteria: UTF-8 encoding without BOM" - The
box and .tr files were just UTF8 (with BOM?) so resaved as UTF8 Without BOM and
training worked a treat!
Thanks
Original comment by crazychrisnz
on 21 Feb 2013 at 7:44
I'm getting this error too, although notepad++ is telling me that all my files
are already UTF8 without BOM! My error is:
Reading eng.lettergothic.exp0.box.tr ...
Font id = -1/0, class id = 53/55 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file
..\..\classify\trainingsampleset.cpp, line 622
Also I think the wiki should be corrected to:
shapeclustering -F font_properties -U unicharset eng.timesitalic.exp0.box.tr
instead of:
shapeclustering -F font_properties -U unicharset eng.timesitalic.exp0.tr
Original comment by marki...@gmail.com
on 14 Jun 2013 at 9:00
Does the shapeclustering program pull the font name from the file name of the
.tr file? If that is the case, I think my font_properties is fine - utf8, no
BOM, ending in \n.
Here it is:
lettergothic 1 0 0 1 0
Original comment by marki...@gmail.com
on 14 Jun 2013 at 9:05
@markisus: if your tr filename is eng.timesitalic.exp0.box.tr, than you are not
following wiki.
Original comment by zde...@gmail.com
on 15 Jun 2013 at 7:00
[deleted comment]
Check your *.tr file. In my case it contains font names like "fontname.exp0 9
448 3231 467 3263 0"
Remove ".exp0" postfix from each line in the file.
Also use "fontname 1 0 0 1 0" format of font_properties file instead of
"lang.fontname.box 1 0 0 1 0"
Original comment by skat....@gmail.com
on 18 Dec 2013 at 1:54
Original issue reported on code.google.com by
withbles...@gmail.com
on 18 Feb 2012 at 3:26Attachments: