raffaeldantas / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
1 stars 0 forks source link

Shape Clustering for training, do or not do? #948

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Use "shapeclustering" while training tesseract
2.
3.

What is the expected output? What do you see instead?
Accurate recognition for characters which look similar

What version of the product are you using? On what operating system?
Tesseract 3.0.2

Please provide any additional information below.
I am very curious about what shapeclustering actually do for my training. I 
tried 2 training, with and without shape clustering. With shapeclustering, 
recognition on characters like "1" and "I", "O" and "Q", "6" and "G" is not 
good, error rates are quite high. Without shapeclustering, it works alright, 
above 98% of the chance I got them correct. 

Did I do something wrong? Is this suppose to happen?

Original issue reported on code.google.com by swe...@gmail.com on 11 Jul 2013 at 4:50

GoogleCodeExporter commented 9 years ago
See https://code.google.com/p/tesseract-ocr/wiki/FAQ#Rules_and_advices

Original comment by zde...@gmail.com on 12 Jul 2013 at 7:43

GoogleCodeExporter commented 9 years ago
From the DEV list

On Mon, Jul 15, 2013 at 10:01 AM, Ray Smith <theray.....> wrote:

    The idea of shape clustering is that it should help to resolve exactly the errors that you observe! It doesn't work too well at the moment though for most languages. It currently should not be used except for the Indic languages, where it does seem to help.
    Ray.

    On Sun, Jul 14, 2013 at 7:54 PM, Shane Wee <sw......> wrote:

        I am using tesseract 3.0.2, I trained my data with shapeclustering included, the result is not as good comparing with the traineddata I got from excluding shapeclustering.
        Shapeclutering seems to cause error recognition on similar shape character such as 1 and I, O and Q, 5 and S. 
        I am quite sure I follow the training steps correctly.
        My question is whether shapeclustering is really important? If I exclude it from my training, will I miss out anything important?

Original comment by shreeshrii on 16 Jul 2013 at 4:16

GoogleCodeExporter commented 9 years ago
Thank you so much for the info ! :)

Original comment by swe...@gmail.com on 16 Jul 2013 at 7:05