weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
498 stars 170 forks source link

often recognize 'u' wrongly #42

Closed kojit closed 5 years ago

kojit commented 5 years ago

Hello,

I trained your model with mjsynth dataset and default parameter settings over 1000000 steps. I found that the model often wrongly recognizes character 'u'. It seems as if there is no 'u' class. Do you have any thoughts about what the cause might be?

kojit commented 5 years ago

I've done training the model 2**21 steps. But it still cannot recognize 'u' correctly, I also found that it doesn't recognize 'q' at all. I tested with several images and found that the probability of 'u' and 'q' before the CTC layer are always 0. Does anyone have such experiences?

kojit commented 5 years ago

It's wired though, after I only changed the CNN model to Shi et al.'s CRNN architecture version, then it recognizes 'u'.

weinman commented 5 years ago

What is the training loss? Validation loss? What values does test.py report on the test data?

The default training parameters sometimes can get stuck (quite early in training) in a poor local minimum. I've never investigated specific character-level confusions/probabilities, but I definitely don't see this behavior in my own experiences.

To avoid local minima, I have set up an alternative training schedule that starts with a small batch size and increases from 16 to 128 as the step size (no staircase) decreases from .0001 down to .000003. (See Takase et al.)

kojit commented 5 years ago

Thanks for your reply.

test.py shows as follows although I didn't use the entire test set because it's too slow.

{'total_num_labels': 144942, 'total_num_sequence_errs': 3892, 'total_num_label_errors': 6711, 'mean_label_error': 0.04630127913234259, 'loss': 1.5078024, 'total_num_sequences': 17837, 'mean_sequence_error': 0.21819812748780623, 'global_step': 2097152}

I understand that you've never seen this problem and you think it's a local minimum. I'd like to try changing battch size.

weinman commented 5 years ago

Those label error rates and sequence error rates seem pretty reasonable. Maybe its not a local minimum.

That loss seems a bit high, but I just realized that test.py probably reports only the last test batch's loss (rather than a cumulative average, which it should).

What's the smoothed training loss (i.e., as reported in tensorboard)? (Say with a smoothing factor of something like 0.95.)

My training schedule is as follows:

Batch Size Learning Rate Steps (Cumulative)
16 1e-4 2^16
32 3e-5 2^18
64 3e-5 2^19
128 1e-5 2^19 + 2^18
128 3e-6 2^20
kojit commented 5 years ago

Smoothed training loss is 1.072.

weinman commented 5 years ago

Oh yeah that's probably not very good. You want it down around 0.4–0.5.

The colors below indicate the training sessions in the table above.

image
kojit commented 5 years ago

I've done training the model 2**21 steps. What was wrong...?

weinman commented 5 years ago

@kojit I forgot to add, I set --decay_rate=1.0 so the learning rate was fixed at each stage of training.

I recommend you read the recent Neural Computation paper I cited above to get a sense of why it's not the number of steps but the batch size that can have an overriding performance impact.

kojit commented 5 years ago

Thanks. I will try with that and report later.

sahilbandar commented 5 years ago

Same here, I've trained this for the 1^21 epochs, It is not able to recognise 8 and 9. Is anything there which I've to modify in training hyper parameters settings?

weinman commented 5 years ago

@sahilbandar Just decay rate=1, as well as the batch size, learning rate, max number of steps, (and tune from) to set the schedule noted above.