weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
497 stars 170 forks source link

sequence length problem #15

Closed qwzhong1988 closed 6 years ago

qwzhong1988 commented 6 years ago

hi, calculating sequence length in calc_seq_len() (mjsynth-tfrecord.py) should be the same with convnet_layers() (model.py)? may be like this:

import model
kernel_sizes = [ params[1] for params in model.layer_params]
def calc_seq_len(image_width):
    conv1_trim =  2 * (kernel_sizes[0] // 2)
    after_conv1 = image_width - conv1_trim
    after_pool2 = after_conv1 // 2
    after_pool4 = after_pool2 - 1
    after_pool6 = after_pool4 - 1
    after_pool8 = after_pool6
    sequence_length = after_pool8
    return sequence_length
weinman commented 6 years ago

Thanks for noting this.

The answer is: Yes, most likely. This calculation was authored before I changed the model architecture as presently reflected. (I never regenerated the tfrecords, because it's time and disk intensive.)

I believe the present version is no less conservative, in that it discards images with sequence lengths that are too short to produce the necessary character output. Moreover, the kernel sizes differ, too.

It's something worth fixing. I'll close this when I get around to verifying the proper equivalence. (A pull request would be welcome as well, though not essential.)

weinman commented 6 years ago

Fixed in commit 02f8f26212f33158978742c57f5bc1a52801cab7 Merge of the complete overhaul represented by that branch is forthcoming.

pczzy commented 5 years ago

seq_lens = [calc_seq_len( w ) for w in range( 1024 )] raise error when image_width > 1024 while I regenerate some Chinese characters tfrecord. Can I change 1024 to 2048 safely?

weinman commented 5 years ago

Yes, I don't see that causing any problem.