jacklicn / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

the quality of debuged image after ocr is much poor than the original fille's. #554

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.
before OCR just SetVariable("tessedit_dump_pageseg_images", "true"); 
of create one config file named "debugimaeg" filled with " 
tessedit_dump_pageseg_images T "
2.
just run Tesseract as usual.only if you are using the config file 
do remember to add an extra parameter after lang " debugimaeg ".
for example "tesseract image.bmp image -l eng debugimaeg"
the option will save the image processed by Tesseract during OCR.
files name as follows (extention name depending on the input file):
tessinput.xxx   tessnoimagesi.xxx  tessnolines.xxx
choose one of them to compare with the original input image.
according to my attached image , the quality of image processed 
is kind of poor than the original one.
in some way it should have an effect on the result of OCR. 
please check out the attached file for details .

BTW:
how to stop the Tesseract to do image processing like (threshold ,grayscale 
etc..). just to OCR the image as I input . because sometime the image has been 
already processed before to OCR it. 

Please use labels and text to provide additional information.
Win7 32bit  AMD 4GB RAM   Tesseract 3.01 (r626)

Original issue reported on code.google.com by wowgreat...@gmail.com on 29 Sep 2011 at 5:08

Attachments:

GoogleCodeExporter commented 9 years ago
Tesseract is OCRing binary image (http://en.wikipedia.org/wiki/Binary_image) so 
it need to convert you 16,7 Millions colors image/32 BitsPerPixel (or 90 unique 
colors) to 2 colors with Simple Otsu thresholding[1]

If you do like this algorithm, provide tesseract already binarized image.

[1] 
http://code.google.com/p/tesseract-ocr/source/browse/trunk/ccstruct/otsuthr.cpp

Original comment by zde...@gmail.com on 21 Jul 2012 at 3:46

GoogleCodeExporter commented 9 years ago
How to feed already binarized image to tesseract.exe ?

Original comment by tempname...@gmail.com on 20 Mar 2014 at 6:57