meh / ruby-tesseract-ocr

A Ruby wrapper library to the tesseract-ocr API.
629 stars 74 forks source link

Error getting setup #3

Closed emlama closed 12 years ago

emlama commented 12 years ago

Hey there, I'm really grateful that you've put together this gem. I'm looking to do some OCR work for a grad school project and would love to be working in Ruby and Tesseract seems like the right OCR library to be working with.

So far I'm having a hard time getting up and running though. Below are the crashes that happen when I try to include 'tesseract or 'tesseract-ocr' in IRB. Could you take a look at what I'm doing and see if there's anything apparently wrong?

FYI things I know/am running

When I try to include Tesseract in a IRB it throws the follow error:

ruby-1.9.2-head > require 'tesseract'
NameError: uninitialized constant Object::Tesseract
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/tesseract-ocr-0.1.3/bin/tesseract.rb:57:in `<top (required)>'
        from <internal:lib/rubygems/custom_require>:33:in `require'
        from <internal:lib/rubygems/custom_require>:33:in `rescue in require'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from (irb):1
        from /Users/matthewforr/.rvm/rubies/ruby-1.9.2-head/bin/irb:17:in `<main>'

And when I require Tesseract-OCR I get this error:

ruby-1.9.2-head > require 'tesseract-ocr'
CompilationError: compile error: see logs at /var/folders/Iq/IqwSThA+GH4e2Qhf3ycWO++++TI/-Tmp-/.ffi-inliner-501/b18ff1df59dbd350a8f5ea7d96fecaafd9e1d4ea.log
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/compilers/gcc.rb:19:in `compile'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/c.rb:101:in `shared_object'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders.rb:80:in `block in build'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders.rb:77:in `instance_eval'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders.rb:77:in `build'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/inliner.rb:44:in `singleton_inline'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/inliner.rb:29:in `inline'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/tesseract-ocr-0.1.3/lib/tesseract/c.rb:34:in `<module:C>'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/tesseract-ocr-0.1.3/lib/tesseract/c.rb:31:in `<module:Tesseract>'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/tesseract-ocr-0.1.3/lib/tesseract/c.rb:29:in `<top (required)>'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/tesseract-ocr-0.1.3/lib/tesseract/api.rb:26:in `<top (required)>'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/tesseract-ocr-0.1.3/lib/tesseract-ocr.rb:35:in `<top (required)>'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from (irb):2

The results of that log look like:

/var/folders/Iq/IqwSThA+GH4e2Qhf3ycWO++++TI/-Tmp-/.ffi-inliner-501/b18ff1df59dbd350a8f5ea7d96fecaafd9e1d4ea.cpp:1:30: error: tesseract/strngs.h: No such file or directory
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: expected constructor, destructor, or type conversion before ‘*’ token
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: variable or field ‘destroy_string’ declared void
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘STRING’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘value’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘STRING’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘value’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: expected ‘,’ or ‘;’ before ‘{’ token
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘STRING’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘value’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: expected ‘,’ or ‘;’ before ‘{’ token

I believe I've correctly pulled down your version of ffi-inline and installed it but that could be the sticking point. I also noticed that a few commits back you changed the gemspec to no longer require ffi-inliner but ffi-inline instead yet in the c.rb file it's using ffi-inliner. I'm kind of a newb so maybe that isn't the problem.

Any thoughts?

meh commented 12 years ago

Ok so, about the require 'tesseract' it's somehow requiring the bin instead of the file in lib, so I guess it's ok it's failing.

About the require 'tesseract-ocr' I'd say the tesseract installed through ports either misses the headers or misses the strngs.h header, find out where it installed the headers and if they're all present.

> ls /usr/include/tesseract
adaptive.h                      matchdefs.h
alignedblob.h                   matchtab.h
altlist.h                       matrix.h
ambigs.h                        measure.h
apitypes.h                      memblk.h
associate.h                     memryerr.h
baseapi.h                       memry.h
basedir.h                       mergenf.h
baseline.h                      mfcpch.h
bbgrid.h                        mfdefs.h
beam_search.h                   mf.h
bestfirst.h                     mfoutline.h
bits16.h                        mfx.h
bitvec.h                        mod128.h
blckerr.h                       ndminx.h
blkocc.h                        neural_net.h
blobbox.h                       neuron.h
blobclass.h                     normalis.h
blobs.h                         normfeat.h
blread.h                        normmatch.h
bmp_8.h                         notdll.h
boxread.h                       nwmain.h
boxword.h                       ocrblock.h
cached_file.h                   ocrclass.h
callcpp.h                       ocrfeatures.h
ccstruct.h                      ocrrow.h
ccutil.h                        oldbasel.h
char_altlist.h                  oldheap.h
char_bigrams.h                  oldlist.h
char_samp_enum.h                olutil.h
char_samp.h                     osdetect.h
char_samp_set.h                 otsuthr.h
char_set.h                      outfeat.h
chartoname.h                    outlines.h
chop.h                          output.h
chopper.h                       pageiterator.h
classifier_base.h               pageres.h
classifier_factory.h            paramsd.h
classify.h                      params.h
closed.h                        pdblock.h
clst.h                          permute.h
cluster.h                       pgedit.h
clusttool.h                     picofeat.h
colfind.h                       pieces.h
colpartitiongrid.h              pithsync.h
colpartition.h                  pitsync1.h
colpartitionset.h               platform.h
commontraining.h                plotedges.h
con_comp.h                      plotseg.h
const.h                         points.h
control.h                       polyaprx.h
conv_net_classifier.h           polyblk.h
coutln.h                        protos.h
crakedge.h                      publictypes.h
cube_const.h                    qrsequence.h
cube_line_object.h              quadlsq.h
cube_line_segmenter.h           quadratc.h
cube_object.h                   quspline.h
cube_reco_context.h             ratngs.h
cube_search_object.h            rect.h
cube_tuning_params.h            rejctmap.h
cube_utils.h                    reject.h
cutil_class.h                   render.h
cutil.h                         resultiterator.h
cutoffs.h                       scaleimg.h
danerror.h                      scanedg.h
dawg.h                          scrollview.h
detlinefit.h                    seam.h
devanagari_processing.h         search_column.h
dict.h                          search_node.h
docqual.h                       search_object.h
dppoint.h                       secname.h
drawedg.h                       serialis.h
drawfx.h                        sortflts.h
drawtord.h                      sorthelper.h
edgblob.h                       speckle.h
edgloop.h                       split.h
efio.h                          states.h
elst2.h                         statistc.h
elst.h                          stderr.h
emalloc.h                       stepblob.h
errcode.h                       stopper.h
extern.h                        string_32.h
extract.h                       strngs.h
featdefs.h                      strokewidth.h
feature_base.h                  structures.h
feature_bmp.h                   svmnode.h
feature_chebyshev.h             svshowim.h
feature_hybrid.h                svutil.h
fileerr.h                       tabfind.h
findseam.h                      tablefind.h
fixspace.h                      tablerecog.h
flexfx.h                        tabvector.h
float2int.h                     tally.h
fpchop.h                        tessarray.h
fpoint.h                        tessbox.h
freelist.h                      tesscallback.h
fxdefs.h                        tessdatamanager.h
gap_map.h                       tessedit.h
genblob.h                       tesseractclass.h
genericvector.h                 tesseract_cube_combiner.h
globaloc.h                      tesseractmain.h
globals.h                       tess_lang_mod_edge.h
gradechop.h                     tess_lang_model.h
hashfn.h                        tessopt.h
helpers.h                       tessvars.h
host.h                          textord.h
hosthplb.h                      tfacep.h
hpddef.h                        tfacepp.h
hpdsizes.h                      thresholder.h
hybrid_neural_net_classifier.h  topitch.h
imagefind.h                     tordmain.h
image.h                         tovars.h
imgerrs.h                       tprintf.h
img.h                           trie.h
imgscale.h                      tuning_params.h
imgs.h                          underlin.h
imgtiff.h                       unichar.h
imgunpk.h                       unicharmap.h
input_file_buffer.h             unicharset.h
intfx.h                         unicity_table.h
intmatcher.h                    vecfuncs.h
intproto.h                      werd.h
ipoints.h                       werdit.h
kdtree.h                        word_altlist.h
lang_mod_edge.h                 wordclass.h
lang_model.h                    word_list_lang_model.h
language_model.h                wordrec.h
linefind.h                      wordseg.h
linlsq.h                        word_size_model.h
listio.h                        word_unigrams.h
lsterr.h                        workingpartset.h
makechop.h                      xform2d.h
makerow.h

This is the list of tesseract headers on my system.

emlama commented 12 years ago

Thanks for jumping in here to help, sadly I'm still not quite there. (btw, I have been testing the command line version of tesseract and it seems to be working great)

So I looked into the MacPorts directory and it looks like I had all the headers that you listed above. For the fun luck of it all I went ahead and uninstalled Tesseract from Port and built it from source.

This actually gave me a new error, this time I was missing a header file in Leptonica which is still installed via Port. My guess is that with ffi-inliner somehow can't find the path to the headers in /opt/local/bin/.

I'm currently working on building Leptonica from source to see if that fixes the issues and will rebuild Tesseract once that is done. In the meantime, do you have any thoughts on why ffi-inliner wouldn't be able to see those headers?

Attached is the IRB error and the compile error

1.9.2-head :002 > require 'tesseract-ocr'
CompilationError: compile error: see logs at /var/folders/Iq/IqwSThA+GH4e2Qhf3ycWO++++TI/-Tmp-/.ffi-inliner-501/256ff6410bc0f28f0e471477374b56e2eac6b775.log
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/compilers/gcc.rb:19:in `compile'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/c.rb:101:in `shared_object'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders.rb:80:in `block in build'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders.rb:77:in `instance_eval'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders.rb:77:in `build'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/inliner.rb:44:in `singleton_inline'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/inliner.rb:29:in `inline'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/tesseract-ocr-0.1.3/lib/tesseract/c/leptonica.rb:30:in `<module:Leptonica>'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/tesseract-ocr-0.1.3/lib/tesseract/c/leptonica.rb:27:in `<module:C>'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/tesseract-ocr-0.1.3/lib/tesseract/c/leptonica.rb:25:in `<module:Tesseract>'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/tesseract-ocr-0.1.3/lib/tesseract/c/leptonica.rb:25:in `<top (required)>'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/tesseract-ocr-0.1.3/lib/tesseract/c.rb:82:in `<top (required)>'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/tesseract-ocr-0.1.3/lib/tesseract/api.rb:26:in `<top (required)>'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from /Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/tesseract-ocr-0.1.3/lib/tesseract-ocr.rb:35:in `<top (required)>'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from (irb):2

/var/folders/Iq/IqwSThA+GH4e2Qhf3ycWO++++TI/-Tmp-/.ffi-inliner-501/256ff6410bc0f28f0e471477374b56e2eac6b775.cpp:1:34: error: leptonica/allheaders.h: No such file or directory
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/c.rb:57: error: ‘int32_t’ does not name a type
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: expected constructor, destructor, or type conversion before ‘*’ token
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: expected constructor, destructor, or type conversion before ‘*’ token
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: expected constructor, destructor, or type conversion before ‘*’ token
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘Pix’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘pix’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘uint8_t’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘data’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘size_t’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘size’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘Format’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: initializer expression list treated as compound expression
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: expected ‘,’ or ‘;’ before ‘{’ token
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: variable or field ‘pix_destroy’ declared void
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘Pix’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘pix’ was not declared in this scope
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘int32_t’ does not name a type
/Users/matthewforr/.rvm/gems/ruby-1.9.2-head@tesseract/gems/ffi-inline-0.0.2/lib/ffi/inliner/builders/cpp.rb:25: error: ‘int32_t’ does not name a type
meh commented 12 years ago

It's probably because gcc's include path doesn't include /opt/local/include, I'm gonna add CFLAGS/LDFLAGS support in ffi-inline so you can add the include path as an env var.

meh commented 12 years ago

Ok, pushed a new version of ffi-inline, just export CFLAGS to something like -I/opt/local/include or the path where the leptonica dir with the headers is.

nbomberger commented 12 years ago

I am having this issue as well when I require tesseract. Can you explicitly tell me how to fix this issue? Not sure where to export CFLAGS - before I do the gem install tesseract? How can I tell if I have the dependencies in the right place? Any help is appreciated.

nbomberger commented 12 years ago

I installed the libraries using homebrew. Is there a way to tell the gem that the libraries are installed somewhere else? /usr/local/Cellar?

meh commented 12 years ago

I am having this issue as well when I require tesseract. Can you explicitly tell me how to fix this issue? Not sure where to export CFLAGS - before I do the gem install tesseract? How can I tell if I have the dependencies in the right place? Any help is appreciated.

You either do the following in the shell before running the application that uses tesseract

export CFLAGS=-I/path/to/the/headers
export LDFLAGS=-L/path/to/the/libs/

Or set them before requiring tesseract like below

ENV['CFLAGS'] = '-I/path/to/the/headers'
ENV['LDFLAGS'] = '-L/path/to/the/libs'
nbomberger commented 12 years ago

works - thank you!

pgericson commented 7 years ago

Solution for Redhat installation of leptonica and tesseract with the correct arguments to make it run

inspired from this .sh install file which did not work completely for me

yum -y update
yum -y install libstdc++ autoconf automake libtool autoconf-archive pkg-config gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel

#Install AutoConf-Archive
wget ftp://mirror.switch.ch/pool/4/mirror/epel/7/ppc64/a/autoconf-archive-2016.09.16-1.el7.noarch.rpm
rpm -i autoconf-archive-2016.09.16-1.el7.noarch.rpm

# install leptonica from github
wget https://github.com/DanBloomberg/leptonica/archive/v1.72.tar.gz
tar -zxvf v1.72.tar.gz
cd leptonica-1.72
chmod +x configure # this is only if you have permission problems.
./configure --prefix=$HOME/local/
make
make install
cd ..

# install tesseract from github
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz
tar -zxvf 3.04.01.tar.gz
cd tesseract-3.04.01/
./autogen.sh
LIBLEPT_HEADERSDIR=$HOME/local/include ./configure --prefix=$HOME/local/ --with-extra-libraries=$HOME/local/lib
make
make install
LD_LIBRARY_PATH=#{home_dir}/local/lib CFLAGS=-I#{home_dir}/local/include LDFLAGS=-L#{home_dir}/local/lib bundle exec irb -r 'tesseract'
# you should see no errors

Any poor soul that finds this in the future, I hope this helps.

This will solve this issue and issue #32