meh / ruby-tesseract-ocr

A Ruby wrapper library to the tesseract-ocr API.
629 stars 74 forks source link

Difference in output generated by gem and tesseract command line #40

Open Meenal-goyal opened 10 years ago

Meenal-goyal commented 10 years ago

I was trying to extract text from image using tesseract command line but since I wanted to use ruby script I tried your gem. Now, the problem is I am getting different output by gem. Also in some cases gem is not performing at par and giving bad output. Is there any version difference? Additional info:

$ tesseract -v tesseract 3.02.02 leptonica-1.69 libjpeg 8d : libpng 1.6.12 : zlib 1.2.5

What version is gem using?

meh commented 10 years ago

The gem uses the version installed on the system.

Meenal-goyal commented 10 years ago

Then what's the reason of getting different output? Is it possible that may be gem uses the older version of tesseract installed on system instead of the new version? I have got only latest version on my system but may be it has support for older versions as well.

meh commented 10 years ago

No, that's not how it works. The only possible reason is different default options between the binary and the library.

Meenal-goyal commented 10 years ago

So, how can i change these options for the binary? Also I wanted to set extra configuration variables like matcher_good_threshold etc. what option should i give in the ruby script?

cwulfman commented 9 years ago

Was there ever an answer for this question? I'm having the same problem. This may not be the right place to ask, but how can I see the default configuration being used by the binary so I can pass that configuration into the gem?

meh commented 9 years ago

I honestly don't know, someone should have to dig around the binary's source code to figure out what differing default options are there.

cwulfman commented 9 years ago

Ok; thank you.

On May 11, 2015, at 15:21, meh. notifications@github.com wrote:

I honestly don't know, someone should have to dig around the binary's source code to figure out what differing default options are there.

— Reply to this email directly or view it on GitHub https://github.com/meh/ruby-tesseract-ocr/issues/40#issuecomment-101023666.

amitdo commented 8 years ago

@meh, FYI, the default psm mode for tesseract command line is '3', while for libtesseract it's '6'.