Open p7r opened 11 years ago
Further to this I removed all tesseract libraries on my machine and reinstalled them and the tesseract-ocr gem.
It seems with English it's fine, but it can't find the language files:
$ tesseract.rb
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:248:in `_setup': you have to set an image first (ArgumentError)
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:149:in `text_for'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:77:in `block in <top (required)>'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `tap'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'
$ tesseract.rb --help
Usage: tesseract [options]
--path PATH datapath to set
-l, --language LANGUAGE language to use
-m, --mode MODE mode to use
-p, --psm MODE page segmentation mode to use
-u, --unlv output in UNLV format
-c, --confidence output the mean confidence of the recognition
-C, --config PATH... config files to load
-b, --blacklist LIST blacklist the following chars
-w, --whitelist LIST whitelist the following chars
-s, --scale VALUE scale the image before analyzing it
-r, --resize VALUE resize the image before analyzing it
$ tesseract.rb -l ara image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'
$ tesseract.rb -l ara --path /usr/local/Cellar/tesseract/3.02.02/share image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'
$ tesseract.rb -l ara --path /usr/local/Cellar/tesseract/3.02.02/share/tessdata image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'
Pauls-Mac-mini:arabicocrtest paul$ tesseract.rb -l eng --path /usr/local/Cellar/tesseract/3.02.02/share/tessdata image.png
V L: _ i _ if __ r
., - 7-; f"::"'=:, ’
‘HQ.’ .9 9 " x_. ‘
' .' ”- « >3)’ »
'5--4 war; -11-! 2.! u-r‘J:“fi-&“‘->s’9":‘;’,,‘,’ .4» ma
The garbage output is expected the only text in that image is Arabic.
I'll have a look very soon (likely toward the end of the weekend).
I'm very sorry I haven't looked into this yet, I've been very busy but I promise I will as soon as I have time.
I have the same problem trying the Nerdz example in this repo. The :lol language is not loaded.
I think this is an OS X specific issue, and I don't have such a machine to fix this problem.
In my case my problem is using Ubuntu. If I change the :lol language( in the Nerdz example) to default :en everything works fine.
And it's the same error "the API did not Init correctly (RuntimeError)"
@juniorjp1989 oh, that's almost good to know then, guess it's a problem with non Arch Linux systems.
When trying to do tesseract.rb -l ara or when setting up an Engine as follows:
I'm getting this:
Tesseract itself is installed correctly and using the compiled binary that comes in the package, I am able to load Arabic language files and get OCR output.
Any suggestions gratefully received.