meh / ruby-tesseract-ocr

A Ruby wrapper library to the tesseract-ocr API.
629 stars 74 forks source link

Compilation Error when requiring tesseract #63

Open anirudhsundar98 opened 7 years ago

anirudhsundar98 commented 7 years ago

I've been trying to fix this error for quite some time now but nothing seems to work. Weird thing is, the gem used to work for me. I used it on quite a few images and it produced pretty good results. Then one day, requiring the gem just started throwing compilation errors. I've read through the other Compilation Error posts but i don't think the same problem occurs here/the solution there didn't work for me.

Ruby version :

~$ ruby -v
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]

Tesseract version :

~$ tesseract -v
tesseract 4.00.00alpha
 leptonica-1.74.1
  libjpeg 6b (libjpeg-turbo 1.3.1) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8

 Found AVX
 Found SSE

OS : Debian GNU/Linux 8 (jessie)

I built the leptonica and tesseract libraries from source and the tesseract command line tool works beautifully.

Here's the error output: Within irb :

>> require 'tesseract'
CompilationError: compile error: see logs at /tmp/.ffi-inline-1000/f51a6a90c87c6cb93b85acfe99e9be14599d54dd.log
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/ffi-inline-0.0.4.3/lib/ffi/inline/compilers/gcc.rb:35:in `compile'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/ffi-inline-0.0.4.3/lib/ffi/inline/builders/c.rb:114:in `shared_object'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/ffi-inline-0.0.4.3/lib/ffi/inline/builders.rb:90:in `block in build'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/ffi-inline-0.0.4.3/lib/ffi/inline/builders.rb:87:in `instance_eval'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/ffi-inline-0.0.4.3/lib/ffi/inline/builders.rb:87:in `build'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/ffi-inline-0.0.4.3/lib/ffi/inline/inline.rb:54:in `singleton_inline'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/ffi-inline-0.0.4.3/lib/ffi/inline/inline.rb:39:in `inline'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c.rb:34:in `<module:C>'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c.rb:31:in `<module:Tesseract>'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c.rb:29:in `<top (required)>'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:68:in `require'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:68:in `require'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/api.rb:26:in `<top (required)>'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:68:in `require'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:68:in `require'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract-ocr.rb:35:in `<top (required)>'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:68:in `require'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:68:in `require'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract.rb:25:in `<top (required)>'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:133:in `require'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:133:in `rescue in require'
    from /home/anirudh/.rbenv/versions/2.4.0/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:40:in `require'
    from (irb):1
    from /home/anirudh/.rbenv/versions/2.4.0/bin/irb:11:in `<main>'

Log file :

g++ -shared -fPIC    -o /tmp/.ffi-inline-1000/f51a6a90c87c6cb93b85acfe99e9be14599d54dd.so /tmp/.ffi-inline-1000/f51a6a90c87c6cb93b85acfe99e9be14599d54dd.cpp -ltesseract 2>>/tmp/.ffi-inline-1000/f51a6a90c87c6cb93b85acfe99e9be14599d54dd.log
In file included from /usr/include/c++/4.9/cinttypes:35:0,
                 from /usr/local/include/tesseract/host.h:30,
                 from /usr/local/include/tesseract/memry.h:24,
                 from /usr/local/include/tesseract/strngs.h:27,
                 from /tmp/.ffi-inline-1000/f51a6a90c87c6cb93b85acfe99e9be14599d54dd.cpp:1:
/usr/include/c++/4.9/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support for the \
  ^
In file included from /usr/local/include/tesseract/memry.h:24:0,
                 from /usr/local/include/tesseract/strngs.h:27,
                 from /tmp/.ffi-inline-1000/f51a6a90c87c6cb93b85acfe99e9be14599d54dd.cpp:1:
/usr/local/include/tesseract/host.h:34:9: error: ‘int8_t’ does not name a type
 typedef int8_t inT8;
         ^
/usr/local/include/tesseract/host.h:35:9: error: ‘uint8_t’ does not name a type
 typedef uint8_t uinT8;
         ^
/usr/local/include/tesseract/host.h:36:9: error: ‘int16_t’ does not name a type
 typedef int16_t inT16;
         ^
/usr/local/include/tesseract/host.h:37:9: error: ‘uint16_t’ does not name a type
 typedef uint16_t uinT16;
         ^
/usr/local/include/tesseract/host.h:38:9: error: ‘int32_t’ does not name a type
 typedef int32_t inT32;
         ^
/usr/local/include/tesseract/host.h:39:9: error: ‘uint32_t’ does not name a type
 typedef uint32_t uinT32;
         ^
/usr/local/include/tesseract/host.h:40:9: error: ‘int64_t’ does not name a type
 typedef int64_t inT64;
         ^
/usr/local/include/tesseract/host.h:41:9: error: ‘uint64_t’ does not name a type
 typedef uint64_t uinT64;
         ^
In file included from /usr/local/include/tesseract/strngs.h:27:0,
                 from /tmp/.ffi-inline-1000/f51a6a90c87c6cb93b85acfe99e9be14599d54dd.cpp:1:
/usr/local/include/tesseract/memry.h:27:27: error: ‘inT32’ was not declared in this scope
 extern char *alloc_string(inT32 count);
                           ^
/usr/local/include/tesseract/memry.h:31:24: error: ‘inT32’ was not declared in this scope
 extern void *alloc_mem(inT32 count);
                        ^
/usr/local/include/tesseract/memry.h:33:30: error: ‘inT32’ was not declared in this scope
 extern void *alloc_big_zeros(inT32 count);
                              ^
In file included from /tmp/.ffi-inline-1000/f51a6a90c87c6cb93b85acfe99e9be14599d54dd.cpp:1:0:
/usr/local/include/tesseract/strngs.h:68:5: error: ‘inT32’ does not name a type
     inT32 length() const;
     ^
/usr/local/include/tesseract/strngs.h:69:5: error: ‘inT32’ does not name a type
     inT32 size() const { return length(); }
     ^
/usr/local/include/tesseract/strngs.h:71:5: error: ‘uinT32’ does not name a type
     uinT32 unsigned_size() const {
     ^
/usr/local/include/tesseract/strngs.h:90:23: error: declaration of ‘operator[]’ as non-function
     char &operator[] (inT32 index) const;
                       ^
/usr/local/include/tesseract/strngs.h:90:20: error: expected ‘;’ at end of member declaration
     char &operator[] (inT32 index) const;
                    ^
/usr/local/include/tesseract/strngs.h:90:29: error: expected ‘)’ before ‘index’
     char &operator[] (inT32 index) const;
                             ^
/usr/local/include/tesseract/strngs.h:93:22: error: ‘inT32’ has not been declared
     void truncate_at(inT32 index);
                      ^
/usr/local/include/tesseract/strngs.h:121:24: error: ‘inT32’ has not been declared
     inline void ensure(inT32 min_capacity) { ensure_cstr(min_capacity); }
                        ^
/usr/local/include/tesseract/strngs.h:174:11: error: expected ‘;’ at end of member declaration
     char* ensure_cstr(inT32 min_capacity);
           ^
/usr/local/include/tesseract/strngs.h:174:29: error: expected ‘)’ before ‘min_capacity’
     char* ensure_cstr(inT32 min_capacity);
                             ^
/usr/local/include/tesseract/strngs.h: In member function ‘char* STRING::strdup() const’:
/usr/local/include/tesseract/strngs.h:80:6: error: ‘inT32’ was not declared in this scope
      inT32 len = length() + 1;
      ^
/usr/local/include/tesseract/strngs.h:81:30: error: ‘len’ was not declared in this scope
      return strncpy(new char[len], GetCStr(), len);
                              ^
/usr/local/include/tesseract/strngs.h: In member function ‘void STRING::ensure(int)’:
/usr/local/include/tesseract/strngs.h:121:70: error: expression cannot be used as a function
     inline void ensure(inT32 min_capacity) { ensure_cstr(min_capacity); }
                                                                      ^
/home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c.rb: In function ‘int string_length(STRING*)’:
/home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c.rb:70:19: error: ‘class STRING’ has no member named ‘length’
     return value->length();
                   ^

require 'tesseract-ocr' produces the same error.

So on looking at the log file i noticed from the first part that stdint.h seems to be missing. But even after adding that header file the error doesnt change.

List of headers :

~$ ls /usr/include/tesseract/
apitypes.h  errcode.h        host.h               ocrclass.h      platform.h        serialis.h      thresholder.h
baseapi.h   fileerr.h        ltrresultiterator.h  osdetect.h      publictypes.h     stdint.h        unichar.h
basedir.h   genericvector.h  memry.h              pageiterator.h  renderer.h        strngs.h        unicharmap.h
capi.h      helpers.h        ndminx.h             params.h        resultiterator.h  tesscallback.h  unicharset.h

Please help.

UPDATE:

So i removed the tesseract version i had and built the 3.05.00 release. I still get a compilation error though. But this time the contents of the log file are different.

Log File:

g++ -shared -fPIC    -o /tmp/.ffi-inline-1000/62fdaa4c6a687e4af4b38b4e4d736ed31c2c9031.so /tmp/.ffi-inline-1000/62fdaa4c6a687e4af4b38b4e4d736ed31c2c9031.cpp -ltesseract 2>>/tmp/.ffi-inline-1000/62fdaa4c6a687e4af4b38b4e4d736ed31c2c9031.log
/home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb: In function ‘void set_image(tesseract::TessBaseAPI*, const Pix*)’:
/home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb:171:22: error: invalid conversion from ‘const Pix*’ to ‘Pix*’ [-fpermissive]
     api->SetImage(pix);
                      ^
In file included from /tmp/.ffi-inline-1000/62fdaa4c6a687e4af4b38b4e4d736ed31c2c9031.cpp:1:0:
/usr/local/include/tesseract/baseapi.h:353:8: note: initializing argument 1 of ‘void tesseract::TessBaseAPI::SetImage(Pix*)’
   void SetImage(Pix* pix);
        ^
/home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb: In function ‘bool process_pages(tesseract::TessBaseAPI*, const char*, STRING*)’:
/home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb:183:55: error: no matching function for call to ‘tesseract::TessBaseAPI::ProcessPages(const char*&, NULL, int, STRING*&)’
     return api->ProcessPages(filename, NULL, 0, output);
                                                       ^
/home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb:183:55: note: candidate is:
In file included from /tmp/.ffi-inline-1000/62fdaa4c6a687e4af4b38b4e4d736ed31c2c9031.cpp:1:0:
/usr/local/include/tesseract/baseapi.h:537:8: note: bool tesseract::TessBaseAPI::ProcessPages(const char*, const char*, int, tesseract::TessResultRenderer*)
   bool ProcessPages(const char* filename, const char* retry_config,
        ^
/usr/local/include/tesseract/baseapi.h:537:8: note:   no known conversion for argument 4 from ‘STRING*’ to ‘tesseract::TessResultRenderer*’
/home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb: In function ‘bool process_page(tesseract::TessBaseAPI*, Pix*, int, const char*, STRING*)’:
/home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb:189:71: error: no matching function for call to ‘tesseract::TessBaseAPI::ProcessPage(Pix*&, int&, const char*&, NULL, int, STRING*&)’
     return api->ProcessPage(pix, page_index, filename, NULL, 0, output);
                                                                       ^
/home/anirudh/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb:189:71: note: candidate is:
In file included from /tmp/.ffi-inline-1000/62fdaa4c6a687e4af4b38b4e4d736ed31c2c9031.cpp:1:0:
/usr/local/include/tesseract/baseapi.h:552:8: note: bool tesseract::TessBaseAPI::ProcessPage(Pix*, int, const char*, const char*, int, tesseract::TessResultRenderer*)
   bool ProcessPage(Pix* pix, int page_index, const char* filename,
        ^
/usr/local/include/tesseract/baseapi.h:552:8: note:   no known conversion for argument 6 from ‘STRING*’ to ‘tesseract::TessResultRenderer*’

Tesseract Version:

~$ tesseract -v
tesseract 3.05.00
 leptonica-1.74.1
  libjpeg 6b (libjpeg-turbo 1.3.1) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8
codemilan commented 7 years ago

Any luck for error in Updated Section ? I too facing the same problem.

hari-haran commented 7 years ago

this worked for me : https://github.com/meh/ruby-tesseract-ocr/issues/50#issuecomment-327005723

tinbka commented 6 years ago

I'm having the same problem.

$> uname -r
4.18.5-041805-generic
$> ruby -v
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
$> tesseract -v
tesseract 3.04.01
 leptonica-1.73
  libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.2

The fix from #50 does not work for me (or rather I don't understand how to apply it) as I'm on Ubuntu 16.04 and not Mac.

The log states

g++ -shared -fPIC    -o /tmp/.ffi-inline-1000/da27b7b6426451f95823530285eb4150cb4f91cc.so /tmp/.ffi-inline-1000/da27b7b6426451f95823530285eb4150cb4f91cc.cpp -ltesseract 2>>/tmp/.ffi-inline-1000/da27b7b6426451f95823530285eb4150cb4f91cc.log
/home/shinku/.rvm/gems/ruby-2.5.1@global/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb: In function ‘void set_image(tesseract::TessBaseAPI*, const Pix*)’:
/home/shinku/.rvm/gems/ruby-2.5.1@global/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb:171:22: error: invalid conversion from ‘const Pix*’ to ‘Pix*’ [-fpermissive]
     api->SetImage(pix);
                      ^
In file included from /tmp/.ffi-inline-1000/da27b7b6426451f95823530285eb4150cb4f91cc.cpp:1:0:
/usr/include/tesseract/baseapi.h:356:8: note:   initializing argument 1 of ‘void tesseract::TessBaseAPI::SetImage(Pix*)’
   void SetImage(Pix* pix);
        ^
/home/shinku/.rvm/gems/ruby-2.5.1@global/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb: In function ‘bool process_pages(tesseract::TessBaseAPI*, const char*, STRING*)’:
/home/shinku/.rvm/gems/ruby-2.5.1@global/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb:183:55: error: no matching function for call to ‘tesseract::TessBaseAPI::ProcessPages(const char*&, NULL, int, STRING*&)’
     return api->ProcessPages(filename, NULL, 0, output);
                                                       ^
In file included from /tmp/.ffi-inline-1000/da27b7b6426451f95823530285eb4150cb4f91cc.cpp:1:0:
/usr/include/tesseract/baseapi.h:541:8: note: candidate: bool tesseract::TessBaseAPI::ProcessPages(const char*, const char*, int, tesseract::TessResultRenderer*)
   bool ProcessPages(const char* filename, const char* retry_config,
        ^
/usr/include/tesseract/baseapi.h:541:8: note:   no known conversion for argument 4 from ‘STRING*’ to ‘tesseract::TessResultRenderer*’
/home/shinku/.rvm/gems/ruby-2.5.1@global/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb: In function ‘bool process_page(tesseract::TessBaseAPI*, Pix*, int, const char*, STRING*)’:
/home/shinku/.rvm/gems/ruby-2.5.1@global/gems/tesseract-ocr-0.1.8/lib/tesseract/c/baseapi.rb:189:71: error: no matching function for call to ‘tesseract::TessBaseAPI::ProcessPage(Pix*&, int&, const char*&, NULL, int, STRING*&)’
     return api->ProcessPage(pix, page_index, filename, NULL, 0, output);
                                                                       ^
In file included from /tmp/.ffi-inline-1000/da27b7b6426451f95823530285eb4150cb4f91cc.cpp:1:0:
/usr/include/tesseract/baseapi.h:556:8: note: candidate: bool tesseract::TessBaseAPI::ProcessPage(Pix*, int, const char*, const char*, int, tesseract::TessResultRenderer*)
   bool ProcessPage(Pix* pix, int page_index, const char* filename,
        ^
/usr/include/tesseract/baseapi.h:556:8: note:   no known conversion for argument 6 from ‘STRING*’ to ‘tesseractultRenderer*’
ShamoX commented 6 years ago

On mac for several years only Tesseract 3.05 has been released on Homebrew, and this ruby wrapper doesn't support Tesseract > 3.03... On Linux we restrict the tesseract version installed. For Ubuntu:

apt-get install tesseract-ocr=3.03.02-3 tesseract-ocr-eng=3.02-2 \
tesseract-ocr-osd=3.02-2 tesseract-ocr-equ=3.02-2 \
tesseract-ocr-fra=3.02-2 libtesseract-dev=3.03.02-3 libtesseract3=3.03.02-3

For debian:

apt-get install tesseract-ocr=3.03.03-1 tesseract-ocr-eng=3.02-2 \
tesseract-ocr-osd=3.02-2 tesseract-ocr-equ=3.02-2 \
tesseract-ocr-fra=3.02-2 libtesseract-dev=3.03.03-1 libtesseract3=3.03.03-1

Then we need french extension and there is a naming error so we run this script: https://gist.github.com/ShamoX/49143fa1b1a539fcd641b0c009c0f579

We do that from 2014... So we would appreciate that the #50 is accepted and a new release should be released.

KINGSABRI commented 5 years ago

same issue here tested on

 lsb_release -a
No LSB modules are available.
Distributor ID: Kali
Description:    Kali GNU/Linux Rolling
Release:        2019.1
Codename:       n/a

Ruby

ruby -v
ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux-gnu]

Also tested on Mac with the same result