solivr / tf-crnn

TensorFlow convolutional recurrent neural network (CRNN) for text recognition
GNU General Public License v3.0
292 stars 98 forks source link

Longer outputs than inputs when training on word datasets #21

Closed J-Escher closed 6 years ago

J-Escher commented 6 years ago

When training on word-recognition datasets, the sequence outputs are often longer than the image inputs, and Tensorflow will say something like Not enough time for target transition sequence (required: x, available: y). I suspect this is a problem with the dataset itself (i.e. the target sequence is longer than the CTC input sequence collected from the image for some reason - maybe some items in the dataset are mislabelled), does anyone else have a similar issue?

I can get around this issue by setting ignore_longer_outputs_than_inputs=False in tf.nn.ctc_loss but then I get a cryptic error message: check failed: start <= limit (x vs. y), which makes me think they are related.

Using the regular conjoined MNIST examples or datasets containing only sequences of digits trains fine, but using the synthetic dataset and other word-recognition datasets is plagued with this problem.

Saicat commented 6 years ago

I am using IAM english handwritten line dataset. And I got "check failed: start <= limit (x vs. y)" error (your second problem) after 2 or 3 training epochs. I thought that might be a problem about image aspect ratio (width is not long enough) thus I resize the image to be longer (make width larger before feed into the model) but unfortunately the same error occurred even earlier (at the first training epoch). Then I tried to make the image height larger and the error occurred later (after 10 training epochs). It is strange.

solivr commented 6 years ago

Concerning check failed: start <= limit (x vs. y) Sorry for the late reply, this was not easy to replicate and find... I think the problem comes from random_rotation function. I added a condition checking that the tensor is sliced properly and I don't get the error anymore. I'll update the code soon but for the moment if you're having this error, comment this line in the data augmentation function.