weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
497 stars 170 forks source link

feature extract using CNN with unconstrained length image #7

Closed ghhong1986 closed 6 years ago

ghhong1986 commented 7 years ago

hi weinman , I have read the paper and code and try to understand but a few questions confused me, please help me.

  1. the model we input data by function bucket_by_sequence_length with paramter dynamic_pad setted True , in every batch has a fix shape , but different batch may have different shape, so how does cnn in the model work ?
  2. how to write inference service when input different width images ?
  3. Any theroy about sequence length calculation in end of convnet layer? thanks.
weinman commented 7 years ago
  1. The cnn is applied to the entire image, including the padding. The CTC layer indicates the sequence length; as a result I am not entirely certain whether the gradient for the padded region is included in the loss function. Even if it is, it should not contribute much because it should not change the loss (due to the sequence length restriction).

  2. I am not sure what you are asking. The CTC layers ask for the sequence length precisely so it does not utilize irrelevant logits that arise due to padding.

  3. This calculation captures how the downsizing and padding changes the length of the original, unpadded image width so that you know which timesteps are valid in the final 1D sequence. I make the calculation step-by-step so it is easier to identify (and verify) what the image width is after each layer's transformation.