weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
498 stars 170 forks source link

Computing real sequence length #11

Closed wellescastro closed 6 years ago

wellescastro commented 6 years ago

Hi! I have a simple doubt about the calculation of the sequence length after the conv and pool layers. In the following code, why did you calculate the seq len just until the fourth pooling op (_afterpool4)?

conv1 = conv_layer(inputs, layer_params[0], training ) # 30,30
conv2 = conv_layer( conv1, layer_params[1], training ) # 30,30
pool2 = pool_layer( conv2, 2, 'valid', 'pool2')        # 15,15
conv3 = conv_layer( pool2, layer_params[2], training ) # 15,15
conv4 = conv_layer( conv3, layer_params[3], training ) # 15,15
pool4 = pool_layer( conv4, 1, 'valid', 'pool4' )       # 7,14
conv5 = conv_layer( pool4, layer_params[4], training ) # 7,14
conv6 = conv_layer( conv5, layer_params[5], training ) # 7,14
pool6 = pool_layer( conv6, 1, 'valid', 'pool6')        # 3,13
conv7 = conv_layer( pool6, layer_params[6], training ) # 3,13
conv8 = conv_layer( conv7, layer_params[7], training ) # 3,13
pool8 = tf.layers.max_pooling2d( conv8, [3,1], [3,1], 
                           padding='valid', name='pool8') # 1,13

features = tf.squeeze(pool8, axis=1, name='features') # squeeze row dim

kernel_sizes = [ params[1] for params in layer_params]

#Calculate resulting sequence length from original image widths
conv1_trim = tf.constant( 2 * (kernel_sizes[0] // 2),
                          dtype=tf.int32,
                          name='conv1_trim')
one = tf.constant(1, dtype=tf.int32, name='one')
two = tf.constant(2, dtype=tf.int32, name='two')
after_conv1 = tf.subtract( widths, conv1_trim)
after_pool2 = tf.floor_div( after_conv1, two )
after_pool4 = tf.subtract(after_pool2, one)
sequence_length = tf.reshape(after_pool4,[-1], name='seq_len') # Vectorize
wellescastro commented 6 years ago

I just realize now that you applied the other pooling only at the height axis.

Closing it.

weinman commented 6 years ago

Thanks for the observations. That's a good call for an explanatory comment in that part of the code.

wellescastro commented 6 years ago

Thank you. Just an observation: the first conv is supposed to use padding=valid, right? At least this is my comprehension after reading the comments (# 30,30) and also the table present in the repository readme. However, the first padding indicated in layer_params is "same" instead of "valid".

weinman commented 6 years ago

Good catch. You're right that should probably be changed to 'valid' to match the intent and other comments/code. Thanks!

weinman commented 6 years ago

@wellescastro, I updated this and fixed the downstream sequence length calculation (this was a discrepancy I had noticed earlier but hadn't figured out yet) in (latest) commit e85d6dccac6e92492b8445b7c9e43ebe474bfe76.

I believe everything seems to be working (though I haven't finished testing any performance implications). Do let me know if you find otherwise.

wellescastro commented 6 years ago

Thank you Dr. Weinman. You code is very interesting, it is the only one making use of bucket by sequence length for a HWR task that I found in Github. I'm going to try to use it for handwriting text line recognition using the IAM database :)