meh / ruby-tesseract-ocr

A Ruby wrapper library to the tesseract-ocr API.
629 stars 74 forks source link

Crash when attemping to OCR image #16

Closed mikepfirrmann closed 11 years ago

mikepfirrmann commented 11 years ago

When attempting to use the tesseract-ocr gem to OCR the attached image, Ruby crashes.

Demo script Crash dump

Image: breaks

Using Ubuntu 12.04, with

$ tesseract -v
tesseract 3.02.02
 leptonica-1.69
  libgif 4.1.6 : libjpeg 8b : libpng 1.2.46 : libtiff 3.9.5 : zlib 1.2.3.4

If I run ImageMagick's identify -verbose /path/to/image.png, it reports the Orientation as being "Undefined". Based on the stack dump, I suspect that inability to handle images without defined orientations may be the problem.

meh commented 11 years ago

I'm fairly sure it's an upstream bug, you should open an issue on their bugtracker.

meh commented 11 years ago

I verified it's an upstream bug, if I omit it->Orientation(&result.orientation, &result.writing_direction, &result.textline_order, &result.deskew_angle); it doesn't crash.

A temporary solution is to pass a block instead of returning an array, and then avoid calling orientation on the passed object.

When you don't pass a block it caches all the data in the element because it stays alive only while the iterator is alive too.