OCR on screenshot images are terrible. (even after conversion to black and white)

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Run the following image through tesseract.  Ocrad .17 produced
significantly better results than tesseract for this image

What is the expected output? What do you see instead?
Better accuracy.

What version of the product are you using? On what operating system?
2.01, Windows

Please provide any additional information below.

Original issue reported on code.google.com by athenaar...@gmail.com on 25 Feb 2008 at 6:17

Attachments:

test7.tif

GoogleCodeExporter commented 9 years ago

Works well for me on big text images. Best if I reduce to 2-bit PNGs using 
irfanview.
Using FreeOCR.

Original comment by tomatrot...@gmail.com on 16 May 2008 at 5:26

GoogleCodeExporter commented 9 years ago

After cropping the image I got the following from Tesseract:

>>>>>>>>>>>>

Building the Software under Windows 95/98/NT/2000 with MS VC++

With Microsoftvisual C++ installed, and properly configured for commandline use 
[you will
likely need to source VCVAR$32.BAT in AUTOEXEC.bATor somewhere similar) you 
should be
able to use the provided makefi|e.vc.
The source package is delivered using Unix line termination conventions, which 
workwith
MSVC butdo not wo rk with Windows 'notepad'. lf you use unzip from the Info-Zip 
package,
you can extractthe files using Windows normal line termination conventionswith 
a command

<<<<<<<<<<<<<<<

This seems reasonably accurate, although I haven't compared it to the other 
product
mentioned.

Original comment by benkasminbullock on 29 Jun 2008 at 2:07

GoogleCodeExporter commented 9 years ago

See FAQ on is there a minimum text size.
Also, page segmentation might help with this specific case.

Original comment by theraysm...@gmail.com on 29 Dec 2008 at 11:50

Changed state: WontFix

kareemu3 / tesseract-ocr

OCR on screenshot images are terrible. (even after conversion to black and white) #97