Closed GoogleCodeExporter closed 9 years ago
please follow traning information carefully. Your log shows that you did not do
it - see result:
TessdataManager combined tesseract data files.
Offset for type 0 is -1
Offset for type 1 is 108
Offset for type 2 is -1
Offset for type 3 is -1
Offset for type 4 is -1
Offset for type 5 is -1
Offset for type 6 is -1
Offset for type 7 is -1
Offset for type 8 is -1
Offset for type 9 is -1
Offset for type 10 is -1
Offset for type 11 is -1
Offset for type 12 is -1
"-1" could be interpreted "expected file was not found". Only deu.unicharset
was found and used for deu.traineddata and it is not enough.
Original comment by zde...@gmail.com
on 12 Sep 2011 at 6:30
hey, i just followed the instructions on the documentation here
(http://goo.gl/oLKQi) and got the exact same error as the RP. then a google
search brought me to this page... :(
however, the information on the 1st responder helped me fix the issue. the
instructions do not say we should prefix all the resulting files with 'lang.',
but doing so fixed my issue.
in my case, i was creating a fictional language whose prefix was 'opa' and
after I ran the following commands, i was able to produce a usable traineddata
file:
cp Microfeat opa.Microfeat
cp pffmtable opa.pffmtable
cp normproto opa.normproto
cp mfunicharset opa.mfunicharset
cp inttemp opa.inttemp
combine_tessdata opa.
sudo cp opa.traineddata /usr/share/tessdata/
i am using archlinux, but i figure the process would be pretty much the same on
all linuxes...
Original comment by Peter.Ca...@gmail.com
on 25 Nov 2011 at 11:10
You did not follow documentation [1] - "All you need to do now is collect
together all (normproto, Microfeat, inttemp, pffmtable) the files and rename
them with a lang. prefix..." So it is there.
[1]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Putting_it_all_to
gether
Original comment by zde...@gmail.com
on 26 Nov 2011 at 7:43
I just had the exact same issue, but thankfully Peter's reply was helpful in
resolving it.
However, I feel it's a little too easy blaming the users here.
If you hadn't (also helpfully) pointed out that the exact line in the tutorial
page I'd perhaps never have known it was there.
Let's face it, the web page is 7 (print) pages long, and of all those I needed
a total of 7 lines of terminal commands.
Is it not entirely expected that we only ever skim through the rest? I'd say it
is.
I'd personally like to make a few minor adjustments to make things a little
clearer, but I'm afraid I'm not quite sure how to actually contribute to the
source or wiki.
(Aside: perhaps you could point me in the right direction zde...?)
In either case, my apologies for reviving such an old issue, but perhaps there
are still options to prevent future issues like this?
Could be as easy as making the quoted text into another grey code box like the
other needed terminal commands.
Original comment by clements...@gmail.com
on 3 Dec 2013 at 3:01
Im Facing similar problem.
I was able to generate traineddata successfully for ds digital font and then
when i test i get the error
Tesseract Open Source OCR Engine v3.03 with Leptonica
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file
adaptmatch.cpp, line 522
Abort trap: 6
Now if i go back to running my training files, it fails too.
tesseract eng.ds-digital.exp0.tif eng.ds-digital.exp0 nobatch box.train.stderr
I was able to fix this by uninstalling and installing tesseract..
but again, only once i get sucess,
If i try to test my traineddata file, i get the above error and from then
onwards i cannot use tesseract to train
Original comment by lgma...@gmail.com
on 16 Feb 2014 at 1:04
No, you WAS NOT able to generate traineddata successfully. Error message is
clear.
BTW: why are you using/complaining unreleased version?
Original comment by zde...@gmail.com
on 16 Feb 2014 at 3:13
I was facing the same issue due to not following the doc correctly and Peter's
answer helped.
I'd like to suggest making the relevant sentence bold in the documentation as
it's obviously easy to miss in practice, or having combine_tessdata clearly
warn when it just generated a useless, incomplete file.
Either of those would be really helpful imo.
Original comment by barrdet...@gmail.com
on 31 Mar 2015 at 9:44
Original issue reported on code.google.com by
shahidda...@gmail.com
on 12 Sep 2011 at 1:32Attachments: