meh / ruby-tesseract-ocr

A Ruby wrapper library to the tesseract-ocr API.
629 stars 74 forks source link

[BUG] Segmentation fault #57

Closed rajeevkannav closed 5 years ago

rajeevkannav commented 8 years ago

System Info :1234:

gem list tesseract-ocr -d

tesseract-ocr (0.1.8) Author: meh. Homepage: http://github.com/meh/ruby-tesseract-ocr License: BSD Installed at: /home/rajeev/.rvm/gems/ruby-2.0.0-p598 A wrapper library to the tesseract-ocr API.

tesseract -v

tesseract 3.02.02 leptonica-1.72 libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.51 : libtiff 4.0.3 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0

ruby -v

ruby 2.0.0p598 (2014-11-13 revision 48408) [x86_64-linux]

require 'tesseract-ocr'
_image = '/home/rajeev/workspace/tess-ocr/tess-data/shots-1.jpg'
tesseract_engine_object = Tesseract::Engine.new {|e|
  e.language  = :eng
  e.blacklist = '|'
}

puts "tesseract_engine_object #{tesseract_engine_object.inspect}"
puts  tesseract_engine_object.text_for(_image).strip 

tesseract_engine_object #<Tesseract::Engine:0x00000001ae7aa8 @api=#<Tesseract::API:0x00000001ae7828 @internal=#>, @initializing=false, @init=#Proc:0x00000001ae79e0@main.rb:3, @path=nil, @language=:eng, @mode=:DEFAULT, @variables={"tessedit_char_blacklist"=>"|"}, @config=[], @rectangle=[]> index >= 0 && index < sizeused:Error:Assert failed:in file ../ccutil/genericvector.h, line 512 /home/rajeev/.rvm/gems/ruby-2.0.0-p598/gems/tesseract-ocr-0.1.8/lib/tesseract/api.rb:157: [BUG] Segmentation fault ruby 2.0.0p598 (2014-11-13 revision 48408) [x86_64-linux]

-- Control frame information ----------------------------------------------- c:0007 p:---- s:0028 e:000027 CFUNC :get_utf8_text c:0006 p:---- s:0026 e:000025 IFUNC

[Complete log trace](https://gist.github.com/rajeevkannav/41d84e95a635c2f54dcf

Used Image

@meh Any

kkaczmarczyk commented 5 years ago

This happens also to me with latest version of eng.traineddata. For me downloading and replacing with older version of it fixed the issue. You can try to use older version of this file from here https://github.com/tesseract-ocr/tessdata/blob/3.04.00/eng.traineddata with: wget https://github.com/tesseract-ocr/tessdata/raw/3.04.00/eng.traineddata and save it in your tessdata folder (ex. /usr/share/tessdata)

rajeevkannav commented 5 years ago

@kkaczmarczyk Yup, that same worked for me very long back. I didn't updated that information here

Thank you for bumping, marking it as close.