Closed GoogleCodeExporter closed 9 years ago
Have your read the first steps on the wike-page?
----8<------8<---
Make sure there are a minimum number of samples of each character. 10 is good,
but 5 is OK for rare characters.
There should be more samples of the more frequent characters - at least 20.
----8<------8<---
Original comment by pe...@hhoefling.de
on 15 Mar 2012 at 1:52
I Will try to write a text with 20 samples of each caracter
Original comment by maximili...@gmail.com
on 15 Mar 2012 at 2:25
please produced the text file used for generating tif file for further testing.
because output according to tif file does not produce correct in order
eventhough there is no misspelling. - i feel something wrong with the tif file
itself.
If the text file is uploaded and I shall generate tif and its box file using
Arial font for testing purpose.
Original comment by withbles...@gmail.com
on 15 Mar 2012 at 4:27
One issue is that currently the layout recognition phase of tesseract is
returning 8 columns for the alphabet area (and skips the last "I R" column. And
it therefore decides that the text is running top-to-bottom, rather than
left-to-right.
See attached image: "police-Text Lines (RIL_TEXTLINE).png".
Unfortunately, using -psm 4 (PSM_SINGLE_COLUMN) crashes tesseract (see Issue
653).
-psm 6 (PSM_SINGLE_BLOCK) does cause text rows to be used (see "police-Text
Lines (RIL_TEXTLINE) PSM-6.png") and with the following OCR results:
ABCIJEFGHI
JKLMNUPUR
STUVWXYZ
123455789
This might be the result of the tif saying its 96DPI and therefore 16.67 sq
inches? That's pretty big. However, changing the DPI to 300DPI or 600DPI
doesn't seem to fix things?
The layout is correctly finding the characters (see "police-Connected
Components (RIL_SYMBOL) PSM-6.png"). I'm not sure why it decides to split the D
into I & J.
Possibly the relatively poor OCR is because these aren't "words" but single,
separated letters. You might have to use PSM_SINGLE_CHAR mode with each of
boxes returned by TessBaseAPI::GetConnectedComponents().
Original comment by tomp2...@gmail.com
on 16 Mar 2012 at 5:03
Attachments:
Hello,
I try with a new .tif file with many characters (plaque.tif).
The result is better but not very good (source image for testing : test.tif and
result : test.txt)
Can i do something better ?
Thanks you,
Original comment by maximili...@gmail.com
on 21 Mar 2012 at 9:18
Attachments:
No issue for my problem ?
Original comment by maximili...@gmail.com
on 25 Mar 2012 at 10:01
Did you tried 3.02? Can you post plaque.box file?
Original comment by zde...@gmail.com
on 3 Jan 2013 at 10:14
Closed because of missing input of issue reporter.
Original comment by zde...@gmail.com
on 20 Dec 2013 at 10:57
Original issue reported on code.google.com by
maximili...@gmail.com
on 15 Mar 2012 at 11:36Attachments: