jacklicn / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

'W' instead of 'w' #533

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.tesseract 038.tif 038 -l eng

What is the expected output? What do you see instead?
At few places tesseract recognizes 'W'  instead of 'w'.

What version of the product are you using? On what operating system?
I am using Linux (Archlinux). I tried the official stable version 3.00 from 
community repo, but I also tried with the tesseract-svn revision 602 (from AUR) 
resulting almost the same result.

Please provide any additional information below.
I attached the 038.tif and the output txt files.

Original issue reported on code.google.com by bertz...@gmail.com on 13 Aug 2011 at 5:53

Attachments:

GoogleCodeExporter commented 9 years ago
This is a known issue. Tightening up the parameters that could fix it 
introduces other problems. We are investigating a better fix.

Original comment by theraysm...@gmail.com on 24 Sep 2012 at 10:02

GoogleCodeExporter commented 9 years ago
Same problem with german language! In 1 of 4 cases tesserect recognizes "W" 
instead of "w".

Original comment by thomas.l...@gmail.com on 3 Jan 2014 at 3:56

GoogleCodeExporter commented 9 years ago
I tested issue with current code (svn r1092). Result is in attachment. 
"W" vs "w" issue is based on provided image fixed. 

Original comment by zde...@gmail.com on 4 May 2014 at 9:18

Attachments: