weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
497 stars 170 forks source link

How to deal with single character input #50

Closed xinli94 closed 5 years ago

xinli94 commented 5 years ago

Hi,

When I created tfrecords for my custom dataset, a lot of images got filtered out. Because the input image only contains one character, so precessed image width < min_width (https://github.com/weinman/cnn_lstm_ctc_ocr/blob/master/src/mjsynth-tfrecord.py#L143).

I am wondering what is the correct way to deal with single char inputs. Do I need to set min_width to be a smaller value (already tried 3, still filtered out many images), or should I pad the input image with zeros?

Thanks, Xin

weinman commented 5 years ago

The pooling operations reduce the size of the features horizontally, so it turns out that to recognize a single character requires you start with at least 8 pixels in width (10 pixels for two characters). So that's a hard limit on the input data.

You could pad with zeros, but the results may not be very good, since there will be strong filter responses at the edges.

xinli94 commented 5 years ago

Got it. Thanks so much!