Closed GoogleCodeExporter closed 9 years ago
please post example files: tiff, unmodified box file, modified box file...
Original comment by zde...@gmail.com
on 15 Jan 2011 at 7:33
Hi!
I tried to post the files some days ago, but it seems it didn't work.. well, I
try again now.
I attach here the image I used, the .box file modified and the .tr file
I have also tried to split the tiff image in a multi-page tiff, but it doesn't
work either.
Original comment by Oduss...@gmail.com
on 27 Jan 2011 at 2:34
Attachments:
interesting task! what font you use for displaying hieroglyphs? what program
you used for editing box file?
Original comment by zde...@gmail.com
on 28 Jan 2011 at 9:06
and also a little bit crazy, I think..
As for the font, I am using a font I have created. It is quite good, I tried to
load it here, but it seems that google doesn't like it.
However, you can use this font:
http://www.alanwood.net/unicode/egyptian-hieroglyphs.html
http://www.alanwood.net/unicode/fonts-african.html#egyptianhieroglyphs
it uses the same unicode slots I use, so it is a valid alternative, at least
for tests.
As for the program, it's a homemade solution, based on this program to write
chinese and other non latin scripts:
http://openvanilla.org/index-en.php
I created an additional input pethod which allows me to display the signs
wrinting their Gardiner codes (here the gardienr list:
http://de.wikipedia.org/wiki/Gardiner-Liste )
so.. interesting task, or impossible task?
Could it be that the tiff is too big? maybe too many lines, or too many signs?
Original comment by Oduss...@gmail.com
on 28 Jan 2011 at 4:33
Ah I've forgot to say that I did it in two steps: first I created a .box
covering more or less the 1/3 of the .tiff, and then I used the language
trained data so obtained to create the box for the whole page.
The first .box worked without problem, the second one gives me this issue..
Original comment by Oduss...@gmail.com
on 28 Jan 2011 at 4:37
so any idea to solve thi problem?
Original comment by Oduss...@gmail.com
on 15 Feb 2011 at 5:06
Are you able to compile code from svn? Than you can try to run tesseract 3.02.
It looks like there is some progress, because I was able (at least) to go
through training:
$ tesseract hiero.sethe.exp1.tiff hiero.sethe.exp1 nobatch box.train
$ unicharset_extractor hiero.sethe.exp1.box
$ shapeclustering -F font_properties -U unicharset -O hiero.unicharset
hiero.sethe.exp1.tr
$ mftraining -F font_properties -U unicharset -O hiero.unicharset
hiero.sethe.exp1.tr
$ mv shapetable hiero.shapetable
$ mv inttemp hiero.inttemp
$ mv pffmtable hiero.pffmtable
$ mv normproto hiero.normproto
$ combine_tessdata hiero.
$ cp -f hiero.traineddata \to\your\tessdata_dir\
$ tesseract hiero.test.png hiero.test-ocr -l hiero
During ocr tesseract still produce errors (Error: unichar ΓΕ€ in normproto
file is not in unichar set)...
Original comment by zde...@gmail.com
on 21 Feb 2012 at 10:37
Attachments:
[deleted comment]
Can you please send me a copy of your "font_properties" file and the names of
your source image/tiff files ?
Thanks
Richard
rca08207 (A T) bigpond.net.au
Original comment by nine.ele...@gmail.com
on 30 Jun 2012 at 8:05
Issue 473 has been merged into this issue.
Original comment by zde...@gmail.com
on 24 Jul 2012 at 7:52
[deleted comment]
[deleted comment]
[deleted comment]
I am closing this issue as it is related to old tool (3.00) version.
In current code there are new training tools (text2image) that support training
from font.
Original comment by zde...@gmail.com
on 1 May 2015 at 7:28
Original issue reported on code.google.com by
Oduss...@gmail.com
on 15 Jan 2011 at 4:04