tuanphan09 / captcha-recognition

20 stars 4 forks source link

y_pred interpretation #3

Closed justlike-prog closed 5 years ago

justlike-prog commented 5 years ago

First, thanks for your nicely working captcha recognition repo. I would have a question regarding CRNN:

1) In the ctc_lambda_func you have that iy_pred = iy_pred[:, 2:, :], but I always thought the prediction is two dimensional since it is a matrix having probabilities for each "slice" (time step) of the image of which char it could represent?

2) In the predict..py why do you loop over the y_pred? Dont we always get just one prediction?

tuanphan09 commented 5 years ago
  1. It's because i'm training using batch, so there would have 3 dimensions: [batch size, time step, number of char]
  2. No, it's predictions for all files in list_files
    testing_generator = CapchaDataGenerator(list_files, list_labels, batch_size=BATCH_SIZE, is_testing=True)
justlike-prog commented 5 years ago

Ah ok gotcha :) Thanks so much :)

One short question (don't want to open another thread) - I saw there is a downsample factor but I am not sure what the reason is we have it there. It is used here, where I also dont understand introducing this np.ones vectors - why is this necessary?:

return [X, y, np.ones(self.batch_size) * int(width / downsample_factor - 2), np.ones(self.batch_size) * label_len], y

Also if I have bigger images than you are using, is there some parameter I should tune or does the network behave as well on bigger images? Where do I tune the number of time steps for example?

PS: I think one can use tf.keras.backend.ctc_decode instead coding the function from zero :)

tuanphan09 commented 5 years ago
  1. downsample factor

Model: Image -> CNN -> LSTM -> CTC loss

downsample factor = 2 ^ (number of MaxPool2D layers in CNN)

I used the downsample factor to determine the width of the final output = number of timestep in LSTM = width of CNN's output = image's width // downsample factor

I print the CNN's output shape in model.py:

print(x_reshape.get_shape())  
  1. np.ones vectors

Because i'm training using batch. Those np.ones is to know the length of input and output in CTC loss for each sample. My captcha has the same number of char so all samples the same length, but this allows you to train and predict captchas with difference length.

  1. bigger images

You can change width, height in config.py base on your dataset, in order to caculate CTC loss the number of timestep >= 2 * label_len, so you may be have to change number of MaxPool2D in CNN and update downsample factor.

  1. tf.keras.backend.ctc_decode

Thank, I already tried it, but there are things i like to fully control and understand, like: confidence score, filter beam search with some tricks, etc.

PS: english is not my first language, so if you don't understand anything, let me know.

justlike-prog commented 5 years ago

Ok thanks so much :)