patcharats / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Tesseract(v1.03) is not working for any free size image #65

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I had downloaded Tesseract OCR(v1.03) from www.sourceforge.net. It is 
working as per my requirement. But I have seen a problem that it is 
not working for all image resolutions. For example , it is working 
with 640x480, 960x720 and 1600x1200. But, Tesseract is not working any 
free size image such as 2200x1700, 645x485, 600x300 etc. 

I had downloded Tesseract 2.01 and have tested phototest.tif. It is 
working and giving the correct result(Results_phototest.tif.log). Now I am 
testing a bmp file which has the following properties 

W - 600 px 
H - 700 px 
Res - 96 dpi 
Bit Depth - 1 
Frame Count 1 

This bmp contains the following text in the middle of the image: - 
              Font Size - 12 
              Font Name - Times New Roman 
              MS-Paint 
              File Name - 600x700Mono.bmp 

I have built Tesseract 2.01 and executed the dlltest.exe. But I am NOT 
getting the correct result for the same bitmap file. Please see the 
result (Results_600x700Mono_bmp.log)

c:> dlltest.exe 600x700Mono.bmp Results_600x700Mono_bmp.log

Please see the attached Results_600x700Mono_bmp.log for output 

I am using WindowsXP operating system

Original issue reported on code.google.com by anujkg...@gmail.com on 5 Sep 2007 at 11:35

Attachments:

GoogleCodeExporter commented 9 years ago
The text is too small. Below 8pt @ 300dpi accuracy drops off really quickly to
nothing. This text is 12pt @ 96 dpi => 4 pt @ 300dpi.
Your possible solutions are:
Train tesseract specifically on this font/size combination.
Pre-scale the image to a bigger size.
Get together with the other people with the same problem to write a 
problem-specific
OCR module.
BTW why are you trying to OCR screen text?

Original comment by theraysm...@gmail.com on 5 Sep 2007 at 11:56