feature extract using CNN with unconstrained length image

weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

GNU General Public License v3.0

497 stars 170 forks source link

The cnn is applied to the entire image, including the padding. The CTC layer indicates the sequence length; as a result I am not entirely certain whether the gradient for the padded region is included in the loss function. Even if it is, it should not contribute much because it should not change the loss (due to the sequence length restriction).
I am not sure what you are asking. The CTC layers ask for the sequence length precisely so it does not utilize irrelevant logits that arise due to padding.
This calculation captures how the downsizing and padding changes the length of the original, unpadded image width so that you know which timesteps are valid in the final 1D sequence. I make the calculation step-by-step so it is easier to identify (and verify) what the image width is after each layer's transformation.

weinman / cnn_lstm_ctc_ocr