OCR works better when image is

mmoghimi / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr

Other

0 stars 1 forks source link

OCR works better when image is #1211

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Run an image through tesseract
2. Scale the image by 3x, cubic interpolation in GIMP
3. Run the bigger image through tesseract

What is the expected output? What do you see instead?
Step 1 produces garbage:

"Is very small The quesllon naturally zlrlscs. whethcr lhls dlsplacemenl IS"

Step 3 produces decent output:

"is very small. The question naturally arises whether this displacement is"

What version of the product are you using? On what operating system?

3.02.02 on Windows

Please provide any additional information below.

We should not have to manually resize images, the tesseract software should 
automatically internally resample the image to a suitable resolution before 
running OCR on it.

Original issue reported on code.google.com by omegat...@gmail.com on 23 May 2014 at 11:55

GoogleCodeExporter commented 9 years ago

supposed to be "when image is resized"

Original comment by omegat...@gmail.com on 23 May 2014 at 11:55

GoogleCodeExporter commented 9 years ago

I remember I have tried tesseract in the past a few times, and thought it was 
garbage and immediately uninstalled it because 99% of the output was corrupt 
like this, then today I saw this comment and decided to give it another try, 
and I was shocked that it works great if you just resize the images!

https://sourceforge.net/p/greenshot/feature-requests/517/#53e5

Original comment by omegat...@gmail.com on 24 May 2014 at 2:41

GoogleCodeExporter commented 9 years ago

Tessseract is ENGINE and suite. Processioning should be done user/program. 
Re-sizing is only one solution that does not fix everything[1]

[1] https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality

Original comment by zde...@gmail.com on 24 May 2014 at 1:43

Changed state: Invalid

GoogleCodeExporter commented 9 years ago

Adjusting the characters so they are recognized correctly is the job of the OCR 
ENGINE, not the user.  The user is not adding any information by resizing the 
image, they are just compensating for a flaw in the OCR engine, which can only 
recognize text at certain DPIs.  The OCR engine should handle this 
automatically.

Original comment by omegat...@gmail.com on 24 May 2014 at 1:51

GoogleCodeExporter commented 9 years ago

Here's an proof of concept of code that finds the X height in pixels of the 
lines of text in the image: 
https://gist.github.com/endolith/334196bac1cac45a4893

Is there an optimum X height in pixels for the OCR engine to work correctly?

Original comment by omegat...@gmail.com on 24 May 2014 at 1:52

GoogleCodeExporter commented 9 years ago

Image preprocessing is not task of OCR engine. There are also user that ask not 
to binarize images by OCR engine. Take it or leave it...

Original comment by zde...@gmail.com on 24 May 2014 at 2:17

GoogleCodeExporter commented 9 years ago

Yes, processing the input so that the characters are correctly optically 
recognized is art of the job of the optical character recognition engine.

Obviously user should be able to turn off binarization or rescaling steps if 
desired, and I don't care if the job is split up between multiple exe files, 
such as a "preprocessor" that does the binarization and an "engine", but the 
software should provide a tool that, by default, applies the options that 
produce the best results.

Original comment by omegat...@gmail.com on 24 May 2014 at 2:32