Training Seven Segment Display for OCR

dpanic commented 2 years ago

Hi there,

I have some questions regarding images (PNG) for training of SSD (seven segment display):

Proposed height of image? Is it 36px?
Should I add noise to each image? If yes, why?
Should I traing with one single digit in one picture, or should I place multiple digits in one picture?
What about LEARNING_RATE and MAX_ITERATIONS? I use these MAX_ITERATIONS=30000 and LEARNING_RATE=0.002

Thank you!

matharano commented 2 years ago

Hi!

I'm quite new to all of this, but I've dug a lot in documents hoping to learn how to fine tune the SSD to my application. I should tell I am certainly not the best one to help you, but I'll try:

SSD is based on the english trained data, in which people have found the height of 30-33 px per digit works the best (https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ) - note that we're looking to the digit height, not the image height. I've been using 33 px and results look pretty good;
It depends on your application. If you're sure you can input images in almost perfect definition (by pre-processing or simply by capturing well), I think you can consider not adding noise to training data. Almost every application will have some problem with noise, though;
As I understand, tesseract uses the curves of digits to guess the lines and columns each digit is in so that it can determine boundaries to digits (https://static.googleusercontent.com/media/research.google.com/pt-BR//pubs/archive/33418.pdf). That's why I guessed it should work better if you pass multiple rows of sequences of digits to the training engine. You can find and exemple in here: https://github.com/astutejoe/tesseract-tutorial/blob/main/train/eng.AgencyFB_Bold.exp0.tif;
I have no idea :(

Also, have you tried the 7seg.traineddata? It worked the best for me.

Hope it helps!

dpanic commented 2 years ago

Thank you so much for the answers.

In the meantime I have figured out 1, 2, 3 myself. Regarding 7seg.traineddata, yes I have tried. Letsgodigital is better for me ( https://github.com/arturaugusto/display_ocr ), around 86% hitrate, while 7seg is around 70%.

I have decided to train data myself.

I've been using 3.X version instead of 4.X, because 3.X works better for characters.

tesseract-ocr / tesstrain

Training Seven Segment Display for OCR #282