How to disable noise removal/ Recognize small text

GoogleCodeExporter commented 9 years ago

I am attempting to recognize text captured from screen. All preprocessing is 
done by my program leaving only bare text. Text size is rather small, 5x6 
pixels in size.

I've attempted to train my own data set including a dictionary (I know exactly 
what words will be displayed) and training boxes.

Recognition is poor at best while the image is almost clean: if I use a 
convolution kernel to perform a crude OCR I get better results than Tesseract 
has obtained so far...

So is there a method to improve this? I reckon it has something to do with 
noise removal that I read somewhere; is there a config option that allows me to 
disable this?

Platform: Ubuntu 11.10 64-bit
Usage: Currently from command line, if I can get it to work: direct C++ api.

Original issue reported on code.google.com by cyberwizzard on 12 Nov 2011 at 7:25

GoogleCodeExporter commented 9 years ago

from screen you got usually image with 96 DPI. To get good OCR result you need 
to have images with 200 and more DPI.

http://code.google.com/p/tesseract-ocr/wiki/FAQ#Is_there_a_Minimum_Text_Size?_(I
t_won't_read_screen_text!)

Original comment by zde...@gmail.com on 12 Nov 2011 at 10:14

Changed state: Look-here-for-help

GoogleCodeExporter commented 9 years ago

changing status, because issue is covered in FAQ

Original comment by zde...@gmail.com on 6 Mar 2012 at 8:31

Changed state: WontFix

jacklicn / tesseract-ocr

How to disable noise removal/ Recognize small text #576