Closed kojit closed 5 years ago
I've done training the model 2**21 steps. But it still cannot recognize 'u' correctly, I also found that it doesn't recognize 'q' at all. I tested with several images and found that the probability of 'u' and 'q' before the CTC layer are always 0. Does anyone have such experiences?
It's wired though, after I only changed the CNN model to Shi et al.'s CRNN architecture version, then it recognizes 'u'.
What is the training loss? Validation loss? What values does test.py
report on the test data?
The default training parameters sometimes can get stuck (quite early in training) in a poor local minimum. I've never investigated specific character-level confusions/probabilities, but I definitely don't see this behavior in my own experiences.
To avoid local minima, I have set up an alternative training schedule that starts with a small batch size and increases from 16 to 128 as the step size (no staircase) decreases from .0001 down to .000003. (See Takase et al.)
Thanks for your reply.
test.py shows as follows although I didn't use the entire test set because it's too slow.
{'total_num_labels': 144942, 'total_num_sequence_errs': 3892, 'total_num_label_errors': 6711, 'mean_label_error': 0.04630127913234259, 'loss': 1.5078024, 'total_num_sequences': 17837, 'mean_sequence_error': 0.21819812748780623, 'global_step': 2097152}
I understand that you've never seen this problem and you think it's a local minimum. I'd like to try changing battch size.
Those label error rates and sequence error rates seem pretty reasonable. Maybe its not a local minimum.
That loss seems a bit high, but I just realized that test.py
probably reports only the last test batch's loss (rather than a cumulative average, which it should).
What's the smoothed training loss (i.e., as reported in tensorboard)? (Say with a smoothing factor of something like 0.95.)
My training schedule is as follows:
Batch Size | Learning Rate | Steps (Cumulative) |
---|---|---|
16 | 1e-4 | 2^16 |
32 | 3e-5 | 2^18 |
64 | 3e-5 | 2^19 |
128 | 1e-5 | 2^19 + 2^18 |
128 | 3e-6 | 2^20 |
Smoothed training loss is 1.072.
Oh yeah that's probably not very good. You want it down around 0.4–0.5.
The colors below indicate the training sessions in the table above.
I've done training the model 2**21 steps. What was wrong...?
@kojit I forgot to add, I set --decay_rate=1.0
so the learning rate was fixed at each stage of training.
I recommend you read the recent Neural Computation paper I cited above to get a sense of why it's not the number of steps but the batch size that can have an overriding performance impact.
Thanks. I will try with that and report later.
Same here, I've trained this for the 1^21 epochs, It is not able to recognise 8 and 9. Is anything there which I've to modify in training hyper parameters settings?
@sahilbandar Just decay rate=1, as well as the batch size, learning rate, max number of steps, (and tune from) to set the schedule noted above.
Hello,
I trained your model with mjsynth dataset and default parameter settings over 1000000 steps. I found that the model often wrongly recognizes character 'u'. It seems as if there is no 'u' class. Do you have any thoughts about what the cause might be?