weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
498 stars 170 forks source link

Squeeze function error #33

Closed mhsamavatian closed 5 years ago

mhsamavatian commented 5 years ago

there is no access tp mjsynth dataset. How can I train the model with IAM dataset?

weinman commented 5 years ago

Can you explain what you mean by "there is no access to mjsynth dataset"? To download it, follow the training instructions.

To train the model on your own dataset, you need to generate image, label pairs for your data (e.g., tensors and sparse tensors) and connect them in whatever way seems suitable (e.g., placeholder+feed dict, TFRecord files of Examples, etc.) within a variant of train.py.

I'll be merging an update of the code that uses the Dataset API and the Estimator framework soon, with a more generalized input pipeline. That won't solve your problems directly, but it may give some additional flexibility that could ease the process of adapting to other data/tasks.

mhsamavatian commented 5 years ago

I got the dataset. Their server was out of service for some hours. I modified the make-tfrecord script to create the tf.records for IAM dataset to feed it directly to your model but I got an error in training: Tried to explicitly squeeze dimension 1 but dimension was not 1: I think it relates to the line features = tf.squeeze(pool8, axis=1, name='features') # squeeze row dim it expected the inputs shape [ ?, 32, ?, 1 ] but the shape is (32, ?, ?, 1) when I print it. I did not change any thing in your code I just make the annotation.txt of IAM dataset and feed it to your model to make tf.records. Is this a bug or this is because of my IAM data set images?

in bucketed_input_pipeline function the comment says: image : float32 image tensor [batch_size 32 ? 1] padded to batch max width it returns [32, ?, ?, 1] image shape. Would you please help me with this issue.

mhsamavatian commented 5 years ago

I check with mjsynth data set the shape was also (32, ?, ?, 1). Based one what I saw in the code the image width should be 32 but the min width is set to 20 in the tf.record maker script. I set it to 32 and updated the kernel_size to [3,3,3,3,3,3] (which was [5,5,3....]) It stills give me the same error. Is it supposed to have fixed 32 as the width for all images? Based on readme the second dim is image height. which is supposed to be 32 fixed. right? I have no control on IAM data set. Should I resize all of them to be with hight of 32?

weinman commented 5 years ago

The image width can vary, but the code assumes the image height is exactly 32, so that after the four levels of vertical max-pooling, the height ends up being one.

The data order of the inputs tensor coming into model.convnet_layers should be NHWC. (batch size, height, width, channels). The raw mjsynth data was 31 pixels high, but preprocessing in mjsynth.py repeats a row to make it an "even" 32 pixels.

It's not a mistake in the comment. Whereas the default batch in train.py is also 32, it may be that the dynamic shape reported to you has inferred the fixed batch size but not the image size.

If you have a fixed image height but of a value other than 32, you could try a different strategy for collapsing data vertically (e.g., adjust the pool8 layer op). If your data is not height normalized (or not height normalizable), you'll have to get even more creative.

The primary reason for cutting off the minimum width in the script that processes Examples for the TFRecord files is that there are plenty of images in the mjsynth dataset that do not have enough pixels in width to compensate for the horizontal data pooling (leaving a feature sequence that's shorter than the label sequence, which CTC can't process).