solivr / tf-crnn

TensorFlow convolutional recurrent neural network (CRNN) for text recognition
GNU General Public License v3.0
292 stars 98 forks source link

train Synth90K error #12

Closed ghost closed 6 years ago

ghost commented 7 years ago

image Error Information:tensorflow.python.framework.errors_impl.InvalidArgumentError: Less leaves in the beam search than requested.

I think that the problem is CTC beam search error, however, I don't know how to solve.

cipri-tom commented 6 years ago

I had an error in CTC loss and it was due to some wrong data, where the image was much much shorter than the labels e.g. the images was "he" but the labels were "he ate his sandwich in the park".

Check out for this kind of inconsistencies in the data, also the other way around (long image with too short labels).

solivr commented 6 years ago

Like @cipri-tom, I also saw this type of errors when images where shorter than labels. You should first check that all your images are converted to at least the same number of frames than there are characters in the corresponding text labels.

J-Escher commented 6 years ago

I think I'm having a similar kind of error - the sequence lengths of the output are longer than the input time when training on the synth dataset. Are there some mislabelled images in the synth 90k dataset and should there be checks during runtime to catch this? @solivr Can you clarify what you mean by checking that images are converted to the same number of frames as characters in corresponding text labels?

solivr commented 6 years ago

What I mean is that if you have the word example, this word has 7 characters, so the output of the convolutional layers should be a feature map with at least 7 columns (corresponding to 7 frames), and so your original image should have at width greater of equal to 7*4=28.

J-Escher commented 6 years ago

Do you think there is a straightforward way to catch and mask examples that violate this during runtime, so that the model is robust against mislabelled images? When training on the synth 90k images i suspect there are some mislabelled images, as the tf.nn.ctc_loss complains sequence length > time

@solivr is there a way to retrieve the length of the ground truth sequences? I would like to mask bad entries with sequence_lengths > time.

ghost commented 6 years ago

I find that we can comment the codes of validation and keep the training code leaving. And the training will contiue. And we can validate one by one to avoid the error length images.