mithilesh1125 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

tesseract segfaults on Centos 6.2 #641

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1.Just run with following command:
2./usr/local/bin/tesseract /dir/logo.gif.tif /dir/logo.gif

What is the expected output? What do you see instead?

The file should be converted into text format.

What version of the product are you using? On what operating system?

OS Info:
# uname -a
Linux 2.6.32-220.el6.i686 #1 SMP Tue Dec 6 16:15:40 GMT 2011 i686 i686 i386 
GNU/Linux

#cat /etc/issue
CentOS release 6.2 (Final)
Kernel \r on an \m

# tesseract -v
tesseract 3.01

leptonica version is: leptonica-1.68

The input file is of following type:

# identify /dir/logo.gif.tif
/dir/logo.gif.tif TIFF 754x144 754x144+0+0 8-bit Grayscale DirectClass 23.3kb

Please provide any additional information below.
Following is the output from debug:

Program received signal SIGSEGV, Segmentation fault.
tesseract::Classify::ComputeIntCharNormArray (this=0x809c798, 
NormFeature=0x80ddc10, Templates=0x0, CharNormArray=0xbfffb738 "") at 
float2int.cpp:84
84        for (i = 0; i < Templates->NumClasses; i++) {
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.47.el6_2.5.i686 libgcc-4.4.6-3.el6.i686 libjpeg-6b-46.el6.i686 
libstdc++-4.4.6-3.el6.i686 libtiff-3.9.4-1.el6_0.3.i686 zlib-1.2.3-27.el6.i686
(gdb) where
#0  tesseract::Classify::ComputeIntCharNormArray (this=0x809c798, 
NormFeature=0x80ddc10, Templates=0x0, CharNormArray=0xbfffb738 "") at 
float2int.cpp:84
#1  0x002b32b2 in tesseract::Classify::GetIntCharNormFeatures (this=0x809c798, 
Blob=0x80ab4a0, Templates=0x0, IntFeatures=0xbfffd738, CharNormArray=0xbfffb738 
"",
    BlobLength=0xb7fb3008, FeatureOutlineArray=0x0) at adaptmatch.cpp:2066
#2  0x002b33a7 in tesseract::Classify::GetCharNormFeatures (this=0x809c798, 
Blob=0x80ab4a0, Templates=0x0, IntFeatures=0xbfffd738, CharNormArray=0xbfffb738 
"",
    BlobLength=0xb7fb3008, FeatureOutlineIndex=0x0) at adaptmatch.cpp:1916
#3  0x002b4940 in tesseract::Classify::CharNormClassifier (this=0x809c798, 
Blob=0x80ab4a0, Templates=0x0, Results=0xb7fb3008) at adaptmatch.cpp:1389
#4  0x002b54ad in tesseract::Classify::DoAdaptiveMatch (this=0x809c798, 
Blob=0x80ab4a0, Results=0xb7fb3008) at adaptmatch.cpp:1626
#5  0x002b7aa2 in tesseract::Classify::AdaptiveClassifier (this=0x809c798, 
Blob=0x80ab4a0, Choices=0x80ddc00, CPResults=0x0) at adaptmatch.cpp:183
#6  0x002ad310 in tesseract::Wordrec::call_matcher (this=0x809c798, 
tessblob=0x80ab4a0) at tface.cpp:179
#7  0x002ad9ff in tesseract::Wordrec::classify_blob (this=0x809c798, 
blob=0x80ab4a0, string=0x34f802 "chop_word:", color=Green) at wordclass.cpp:71
#8  0x00298d34 in tesseract::Wordrec::chop_word_main (this=0x809c798, 
word=0x80ab240) at chopper.cpp:510
#9  0x002ad444 in tesseract::Wordrec::cc_recog (this=0x809c798, word=0x80ab240) 
at tface.cpp:121
#10 0x001d3add in tesseract::Tesseract::recog_word_recursive (this=0x809c798, 
word=0x80ab240, blob_choices=0x80aa568) at tfacepp.cpp:114
#11 0x001d4b9d in tesseract::Tesseract::recog_word (this=0x809c798, 
word=0x80ab240, blob_choices=0x80aa568) at tfacepp.cpp:55
#12 0x001c92e2 in tesseract::Tesseract::tess_segment_pass1 (this=0x809c798, 
word=0x80ab240, blob_choices=0x80aa568) at tessbox.cpp:56
#13 0x001a6bce in tesseract::Tesseract::classify_word_pass1 (this=0x809c798, 
word=0x80ab240, row=0x80d4fd0, block=0x80ab6d0) at control.cpp:490
#14 0x001a8aa6 in tesseract::Tesseract::recog_all_words (this=0x809c798, 
page_res=0x80aa810, monitor=0x0, target_word_box=0x0, word_config=0x0, 
dopasses=0) at control.cpp:264
#15 0x00193db1 in tesseract::TessBaseAPI::Recognize (this=0xbffff614, 
monitor=0x0) at baseapi.cpp:559
#16 0x00196a35 in tesseract::TessBaseAPI::ProcessPage (this=0xbffff614, 
pix=0x80a85f0, page_index=0,
    filename=0xbffff880 "/var/www/vhosts/ocrconvert.com/httpdocs/processed/4f4fe0e4cb68e/logo.gif.tif", retry_config=0x0, timeout_millisec=0, text_out=0xbffff664)
    at baseapi.cpp:732
#17 0x00196d02 in tesseract::TessBaseAPI::ProcessPages (this=0xbffff614, 
filename=0xbffff880 
"/var/www/vhosts/ocrconvert.com/httpdocs/processed/4f4fe0e4cb68e/logo.gif.tif",
    retry_config=0x0, timeout_millisec=0, text_out=0xbffff664) at baseapi.cpp:648
#18 0x08048fc2 in main (argc=3, argv=0xbffff734) at ../api/tesseractmain.cpp:138
(gdb)

Original issue reported on code.google.com by easternc...@gmail.com on 1 Mar 2012 at 9:04

GoogleCodeExporter commented 9 years ago
Can you please provide testing image? I can try it in tesseract from svn.

Original comment by zde...@gmail.com on 2 Mar 2012 at 7:25

GoogleCodeExporter commented 9 years ago
I've attached the image, which is causing segfault.

Original comment by easternc...@gmail.com on 2 Mar 2012 at 9:06

Attachments:

GoogleCodeExporter commented 9 years ago
I can not reproduce problem. I tried it on Windows XP SP3 and it was without 
problem. I tried it on Centos 5.7 (I do not have CentOS 6.2) with tesseract 
3.01 and 3.00 and it worked...
So it does not seem to be tesseract problem.

Did you build tesseract by yourself or did you installed it from some package?
Can you send 'ldd /usr/local/bin/tesseract'?

Original comment by zde...@gmail.com on 2 Mar 2012 at 1:45

GoogleCodeExporter commented 9 years ago
Yes, I compiled it.

Following is the output from ldd:

$ ldd /usr/local/bin/tesseract
        linux-gate.so.1 =>  (0x00c3d000)
        libtesseract.so.3 => /usr/local/lib/libtesseract.so.3 (0x00292000)
        liblept.so.2 => /usr/local/lib/liblept.so.2 (0x00110000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x00ad0000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x007bb000)
        libm.so.6 => /lib/libm.so.6 (0x00560000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0058a000)
        libc.so.6 => /lib/libc.so.6 (0x005a8000)
        libtiff.so.3 => /usr/lib/libtiff.so.3 (0x00738000)
        /lib/ld-linux.so.2 (0x00f5c000)
        libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0x00ba6000)
        libz.so.1 => /lib/libz.so.1 (0x0079a000)

Original comment by easternc...@gmail.com on 2 Mar 2012 at 8:38

GoogleCodeExporter commented 9 years ago
Just to confirm: have you installed the required language data files (which 
were updated for v3.01)?  I can reproduce this exact error (i.e., a SEGFAULT in 
ComputeIntCharNormArray) if the "XXX.traineddata" (and associated files) aren't 
present (or aren't readable in the expected location).  The language-specific 
training files are available from the downloads page.

Original comment by courtney...@gmail.com on 18 Mar 2012 at 4:48

GoogleCodeExporter commented 9 years ago
Hi,

Thanks, works like a charm after installing the language pack.

Can you please tell me how can I install all the languages pack?

Original comment by easternc...@gmail.com on 18 Mar 2012 at 10:47

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
If you downloaded/used tesseract-3.01.tar.gz, than you have to download all 
relevant files manually and install them manually.

Or you can use svn version: install svn and than run:
$ svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
$ cd tesseract-ocr

and follow instruction in INSTALL.svn[1] But be aware: language data files are 
more than 600MB!

[1] http://code.google.com/p/tesseract-ocr/source/browse/trunk/INSTALL.SVN

Original comment by zde...@gmail.com on 18 Mar 2012 at 8:03