meh / ruby-tesseract-ocr

A Ruby wrapper library to the tesseract-ocr API.
629 stars 74 forks source link

Problem with using other languages on OS X with tesseract installed with brew #23

Open p7r opened 11 years ago

p7r commented 11 years ago

When trying to do tesseract.rb -l ara or when setting up an Engine as follows:

tesseract = Tesseract::Engine.new{|e| 
# Note this fails for multiple values of e.path and for no value at all
    e.path = "/usr/local/Cellar/tesseract/3.02.02/share/"
    e.language = :ara 
  }

I'm getting this:

Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from ./img2txt.rb:14:in `new'
    from ./img2txt.rb:14:in `<main>'

Tesseract itself is installed correctly and using the compiled binary that comes in the package, I am able to load Arabic language files and get OCR output.

Any suggestions gratefully received.

p7r commented 11 years ago

Further to this I removed all tesseract libraries on my machine and reinstalled them and the tesseract-ocr gem.

It seems with English it's fine, but it can't find the language files:

$ tesseract.rb 
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:248:in `_setup': you have to set an image first (ArgumentError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:149:in `text_for'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:77:in `block in <top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `tap'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'

$ tesseract.rb --help
Usage: tesseract [options]
        --path PATH                  datapath to set
    -l, --language LANGUAGE          language to use
    -m, --mode MODE                  mode to use
    -p, --psm MODE                   page segmentation mode to use
    -u, --unlv                       output in UNLV format
    -c, --confidence                 output the mean confidence of the recognition
    -C, --config PATH...             config files to load
    -b, --blacklist LIST             blacklist the following chars
    -w, --whitelist LIST             whitelist the following chars
    -s, --scale VALUE                scale the image before analyzing it
    -r, --resize VALUE               resize the image before analyzing it

$ tesseract.rb -l ara image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'

$ tesseract.rb -l ara --path /usr/local/Cellar/tesseract/3.02.02/share image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'

$ tesseract.rb -l ara --path /usr/local/Cellar/tesseract/3.02.02/share/tessdata image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'
Pauls-Mac-mini:arabicocrtest paul$ tesseract.rb -l eng --path /usr/local/Cellar/tesseract/3.02.02/share/tessdata image.png
V L: _ i _ if __ r
., - 7-; f"::"'=:,  ’
‘HQ.’ .9 9 " x_. ‘
' .' ”- « >3)’   »
'5--4 war; -11-!  2.! u-r‘J:“fi-&“‘->s’9":‘;’,,‘,’ .4» ma

The garbage output is expected the only text in that image is Arabic.

meh commented 11 years ago

I'll have a look very soon (likely toward the end of the weekend).

meh commented 11 years ago

I'm very sorry I haven't looked into this yet, I've been very busy but I promise I will as soon as I have time.

juniorjp commented 10 years ago

I have the same problem trying the Nerdz example in this repo. The :lol language is not loaded.

meh commented 10 years ago

I think this is an OS X specific issue, and I don't have such a machine to fix this problem.

juniorjp commented 10 years ago

In my case my problem is using Ubuntu. If I change the :lol language( in the Nerdz example) to default :en everything works fine.

And it's the same error "the API did not Init correctly (RuntimeError)"

meh commented 10 years ago

@juniorjp1989 oh, that's almost good to know then, guess it's a problem with non Arch Linux systems.