jacklicn / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

How to disable noise removal/ Recognize small text #576

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I am attempting to recognize text captured from screen. All preprocessing is 
done by my program leaving only bare text. Text size is rather small, 5x6 
pixels in size.

I've attempted to train my own data set including a dictionary (I know exactly 
what words will be displayed) and training boxes.

Recognition is poor at best while the image is almost clean: if I use a 
convolution kernel to perform a crude OCR I get better results than Tesseract 
has obtained so far...

So is there a method to improve this? I reckon it has something to do with 
noise removal that I read somewhere; is there a config option that allows me to 
disable this?

Platform: Ubuntu 11.10 64-bit
Usage: Currently from command line, if I can get it to work: direct C++ api.

Original issue reported on code.google.com by cyberwizzard on 12 Nov 2011 at 7:25

GoogleCodeExporter commented 9 years ago
from screen you got usually image with 96 DPI. To get good OCR result you need 
to have images with 200 and more DPI.

http://code.google.com/p/tesseract-ocr/wiki/FAQ#Is_there_a_Minimum_Text_Size?_(I
t_won't_read_screen_text!)

Original comment by zde...@gmail.com on 12 Nov 2011 at 10:14

GoogleCodeExporter commented 9 years ago
changing status, because issue is covered in FAQ

Original comment by zde...@gmail.com on 6 Mar 2012 at 8:31