meh / ruby-tesseract-ocr

A Ruby wrapper library to the tesseract-ocr API.
629 stars 74 forks source link

unable to get back image or binary_image for any level #15

Closed jronallo closed 11 years ago

jronallo commented 11 years ago

If I try to iterate over any level and try to get the image for that level the class of the returned object is NilClass.

engine = Tesseract::Engine.new {|e|
  e.language  = :eng
  e.blacklist = '|'
  e.whitelist = [*'a'..'z', *'A'..'Z', *0..9, " ."].join
}

engine.each_line_for(filepath) do |line|
  puts line.image.class
end

This is using the latest cloned from github and tesseract 3.02 packaged with Ubuntu 12.10.

meh commented 11 years ago

Could you provide the image you're using please? (it doesn't have anything to do with the image but at least I'll be on a common ground to test what's going wrong)

jronallo commented 11 years ago

Here's a link to it. Note that we don't expect good OCR out of manuscript collections like this. I'm just experimenting with creating an interface to do some cleanup, so need access to the images.

https://docs.google.com/file/d/0ByUq6R632zOwVHg5YVhYMnhPYU0/edit?usp=sharing

meh commented 11 years ago

Thanks, I'm taking a look.

meh commented 11 years ago

Try with the last commit, it should be fixed.

Also sorry for the current lack of thorough documentation, I'll do a document marathon this weekend.

jronallo commented 11 years ago

Thank you for fixing this! It now properly gives me the image I need. I can use line.image.to_blob to write out the image to a file for viewing later.

I was able to find the #to_blob method by looking through the source, but little things like that would be great to have in the documentation. Thank you.

This might be a separate issue, but I'm not sure it wasn't just introduced, so I'll mention it:

line.image.to_blob

Calling the above results in the following error:

tesseract-ocr-0.1.5/lib/tesseract/api/image.rb:76: warning: calling free on non allocated pointer #<FFI::Pointer address=0x000000024b8a10> from tesseract-ocr-0.1.5/lib/tesseract/api/image.rb:76:in `to_blob'
meh commented 11 years ago

Just noticed that, it wasn't introduced now but just a result of image getting now working, I'm looking into it.