Open GoogleCodeExporter opened 9 years ago
I'm having the same error, more info in a second
Original comment by sebek.m...@gmail.com
on 21 Sep 2011 at 5:40
mftraining.exe -F font_properties blue.test.exp1.tr blue.test.exp2.tr
Reading blue.test.exp1.tr...
Reading blue.test.exp2.tr...
Class->NumConfigs == this->fontset_table_.get(Class->font_set_id).size:
Error:Assert failed: in file ..\\classify\intproto.cpp, line 1312
Original comment by sebek.m...@gmail.com
on 21 Sep 2011 at 5:47
and that is with the SVN latest on windows 7 32-bit.
Original comment by sebek.m...@gmail.com
on 21 Sep 2011 at 5:48
Do you have any solution?
I was getting errors from mftraining with a multi page tiff then tried to train
with all the tiff files as single pages. i have 68 tiff files that i generated
the box files from them, may the problem be about the number of files?
Original comment by mervet2...@gmail.com
on 29 Sep 2011 at 1:50
What exactly was your error? Try to post more information so when someone comes
along who knows what they're doing they can implement a fix.
It seems like a bug that anyone with some decent amount of experience
developing tesseract would be able to handle quickly, but I wasn't successful
in acquainting myself with the program's structure in the time I had available.
I was feeding multiple single-page tiffs into mftraining when it crashed, but
again it worked when they were fed individually.
Have you tried feeding only two files in and seeing if mftraining doesn't crash?
Original comment by sebek.m...@gmail.com
on 29 Sep 2011 at 9:16
This is a "feature" but it will be fixed in 3.02.
Currently each tr file *must* represent a different font, as it will create a
different config and the code assumes that there is only one config per font,
hence the assert.
WORK-AROUND 1: Use a multi-page tiff for multiple images with the same font.
They will go into a single tr file during the box.train phase.
WORK-AROUND 2: Cat together multiple tr files that represent the same font.
WORK-AROUND 3: Use a different font name and create a different entry for it in
the font_properties file.
A future version, probably 3.02, will use the font name contained in the tr
file instead of the file name, and sort the font data on reading the tr files,
and this restriction will be lifted.
Original comment by theraysm...@gmail.com
on 1 Oct 2011 at 4:29
Issue 578 has been merged into this issue.
Original comment by zde...@gmail.com
on 18 Nov 2011 at 4:43
Issue 587 has been merged into this issue.
Original comment by zde...@gmail.com
on 24 Nov 2011 at 8:15
Issue 562 has been merged into this issue.
Original comment by zde...@gmail.com
on 23 Feb 2012 at 8:20
Please test current svn code (3.02):
tesseract eng.sysd.exp0.tif eng.sysd.exp0 box.train
tesseract eng.sysd.exp1.tif eng.sysd.exp1 box.train
unicharset_extractor eng.sysd.exp0.box eng.sysd.exp1.box
shapeclustering -F font_properties -U unicharset eng.sysd.exp0.tr
eng.sysd.exp1.tr
mftraining -F eng.font_properties -U unicharset -O eng.unicharset
eng.sysd.exp0.tr eng.sysd.exp1.tr
Original comment by zde...@gmail.com
on 30 Jul 2012 at 8:32
Original issue reported on code.google.com by
nickkeln...@gmail.com
on 21 Aug 2011 at 7:19Attachments: