Closed justlike-prog closed 5 years ago
list_files
testing_generator = CapchaDataGenerator(list_files, list_labels, batch_size=BATCH_SIZE, is_testing=True)
Ah ok gotcha :) Thanks so much :)
One short question (don't want to open another thread) - I saw there is a downsample factor but I am not sure what the reason is we have it there. It is used here, where I also dont understand introducing this np.ones vectors - why is this necessary?:
return [X, y, np.ones(self.batch_size) * int(width / downsample_factor - 2), np.ones(self.batch_size) * label_len], y
Also if I have bigger images than you are using, is there some parameter I should tune or does the network behave as well on bigger images? Where do I tune the number of time steps for example?
PS: I think one can use tf.keras.backend.ctc_decode instead coding the function from zero :)
Model: Image -> CNN -> LSTM -> CTC loss
downsample factor = 2 ^ (number of MaxPool2D layers in CNN)
I used the downsample factor to determine the width of the final output = number of timestep in LSTM = width of CNN's output = image's width // downsample factor
I print the CNN's output shape in model.py:
print(x_reshape.get_shape())
Because i'm training using batch. Those np.ones is to know the length of input and output in CTC loss for each sample. My captcha has the same number of char so all samples the same length, but this allows you to train and predict captchas with difference length.
You can change width, height
in config.py
base on your dataset, in order to caculate CTC loss the number of timestep >= 2 * label_len, so you may be have to change number of MaxPool2D in CNN and update downsample factor.
Thank, I already tried it, but there are things i like to fully control and understand, like: confidence score, filter beam search with some tricks, etc.
PS: english is not my first language, so if you don't understand anything, let me know.
Ok thanks so much :)
First, thanks for your nicely working captcha recognition repo. I would have a question regarding CRNN:
1) In the ctc_lambda_func you have that iy_pred = iy_pred[:, 2:, :], but I always thought the prediction is two dimensional since it is a matrix having probabilities for each "slice" (time step) of the image of which char it could represent?
2) In the predict..py why do you loop over the y_pred? Dont we always get just one prediction?