qjadud1994 / CRNN-Keras

CRNN (CNN+RNN) for OCR using Keras / License Plate Recognition
MIT License
527 stars 191 forks source link

Question about Image Generator #5

Closed odgiv closed 4 years ago

odgiv commented 6 years ago

Hi, you have done a great job by the way. I am trying to understand implementation of the model. I have a question regarding a line number 55 in Image_Generator.py.

input_length = np.ones((self.batch_size, 1)) * (self.img_w // self.downsample_factor - 2)

Am I right that img_w is downsampled by downsampling_factor due to size and number of maxpooling is applied? What I don't also get is why you substract 2 from it again?

qjadud1994 commented 6 years ago

Yes that's right.

downsampling factor = 4 means that the size of the feature map is reduced four times due to two 2x2 maxpools.

And 2 means the first 2 discarded RNN output timesteps since first couple outputs of the RNN tend to be garbage.

input = (batch, 128, 64, 1) RNN output = (batch, 32, Class) # 128 / 4 = 32 CTC input = (batch, 30, Class) # 32 - 2 = 30

It is good to refer to this site, and I personally recommend that you understand CTC.

odgiv commented 6 years ago

Super thank you