weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
498 stars 170 forks source link

Can't recognize consecutive same charactors #9

Closed RaisingSun closed 6 years ago

RaisingSun commented 6 years ago

Hi, I read your excellent paper and use your code to do some experiment. But I found it can not recognize the consecutive charactor when they are same. For example, "good" will be recognized as "god". Could you please help me about this problem? Thanks

weinman commented 6 years ago

The model should learn to emit the blank character between repeats so the CTC decoding doesn't collapse them.

RaisingSun commented 6 years ago

Thanks for your response. You mean I have to add blank character between repeats in label?

Parshwa27 commented 5 years ago

@RaisingSun I have the same issue. Can you help?

weinman commented 5 years ago

@RaisingSun No, you do not add the blank in the label. If your training schedule is sufficient, it will learn to emit the blank so as to force the repeat character after collapsing. Training the model is very sensitive to local minima. See e.g. https://github.com/weinman/cnn_lstm_ctc_ocr/issues/42#issuecomment-428791521