jacklicn / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Segmenation Fault. #549

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

I trained the new language for German Fraktur fonts using tesseract 
successfully, but facing Segmentation fault while running tesseract on image 
using new language. The complete training process process can be found in 
log.txt file

What is the expected output? What do you see instead?

the text file after recognition. but getting following error instead.

tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file 
adaptmatch.cpp, line 512
Segmentation fault

What version of the product are you using? On what operating system?
Tesseract 3.01 on Ubuntu.

Please provide any additional information below.
The tif/box files and complete process of training in log file are attached, 
please help me out from the issue. Thanks in advance.

Original issue reported on code.google.com by shahidda...@gmail.com on 12 Sep 2011 at 1:32

Attachments:

GoogleCodeExporter commented 9 years ago
please follow traning information carefully. Your log shows that you did not do 
it - see result:

TessdataManager combined tesseract data files.
Offset for type 0 is -1
Offset for type 1 is 108
Offset for type 2 is -1
Offset for type 3 is -1
Offset for type 4 is -1
Offset for type 5 is -1
Offset for type 6 is -1
Offset for type 7 is -1
Offset for type 8 is -1
Offset for type 9 is -1
Offset for type 10 is -1
Offset for type 11 is -1
Offset for type 12 is -1

"-1" could be interpreted "expected file was not found". Only deu.unicharset 
was found and used for deu.traineddata and it is not enough.

Original comment by zde...@gmail.com on 12 Sep 2011 at 6:30

GoogleCodeExporter commented 9 years ago
hey, i just followed the instructions on the documentation here 
(http://goo.gl/oLKQi) and got the exact same error as the RP. then a google 
search brought me to this page... :(

however, the information on the 1st responder helped me fix the issue. the 
instructions do not say we should prefix all the resulting files with 'lang.', 
but doing so fixed my issue.

in my case, i was creating a fictional language whose prefix was 'opa' and 
after I ran the following commands, i was able to produce a usable traineddata 
file:

cp Microfeat opa.Microfeat
cp pffmtable opa.pffmtable
cp normproto opa.normproto
cp mfunicharset opa.mfunicharset
cp inttemp opa.inttemp
combine_tessdata opa.
sudo cp opa.traineddata /usr/share/tessdata/

i am using archlinux, but i figure the process would be pretty much the same on 
all linuxes...

Original comment by Peter.Ca...@gmail.com on 25 Nov 2011 at 11:10

GoogleCodeExporter commented 9 years ago
You did not follow documentation [1] - "All you need to do now is collect 
together all (normproto, Microfeat, inttemp, pffmtable) the files and rename 
them with a lang. prefix..." So it is there.

[1] 
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Putting_it_all_to
gether

Original comment by zde...@gmail.com on 26 Nov 2011 at 7:43

GoogleCodeExporter commented 9 years ago
I just had the exact same issue, but thankfully Peter's reply was helpful in 
resolving it.

However, I feel it's a little too easy blaming the users here.
If you hadn't (also helpfully) pointed out that the exact line in the tutorial 
page I'd perhaps never have known it was there.

Let's face it, the web page is 7 (print) pages long, and of all those I needed 
a total of 7 lines of terminal commands.
Is it not entirely expected that we only ever skim through the rest? I'd say it 
is.

I'd personally like to make a few minor adjustments to make things a little 
clearer, but I'm afraid I'm not quite sure how to actually contribute to the 
source or wiki.
(Aside: perhaps you could point me in the right direction zde...?)

In either case, my apologies for reviving such an old issue, but perhaps there 
are still options to prevent future issues like this?
Could be as easy as making the quoted text into another grey code box like the 
other needed terminal commands.

Original comment by clements...@gmail.com on 3 Dec 2013 at 3:01

GoogleCodeExporter commented 9 years ago
Im Facing similar problem. 

I was able to generate traineddata successfully for ds digital font and then 
when i test i get the error

Tesseract Open Source OCR Engine v3.03 with Leptonica
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file 
adaptmatch.cpp, line 522
Abort trap: 6

Now if i go back to running my training files, it fails too. 
tesseract eng.ds-digital.exp0.tif eng.ds-digital.exp0 nobatch box.train.stderr

I was able to fix this by uninstalling and installing tesseract.. 
but again, only once i get sucess, 

If i try to test my traineddata file, i get the above error and from then 
onwards i cannot use tesseract to train

Original comment by lgma...@gmail.com on 16 Feb 2014 at 1:04

GoogleCodeExporter commented 9 years ago
No, you WAS NOT able to generate traineddata successfully. Error message is 
clear.
BTW: why are you using/complaining unreleased version?

Original comment by zde...@gmail.com on 16 Feb 2014 at 3:13

GoogleCodeExporter commented 9 years ago
I was facing the same issue due to not following the doc correctly and Peter's 
answer helped.

I'd like to suggest making the relevant sentence bold in the documentation as 
it's obviously easy to miss in practice, or having combine_tessdata clearly 
warn when it just generated a useless, incomplete file.

Either of those would be really helpful imo.

Original comment by barrdet...@gmail.com on 31 Mar 2015 at 9:44