mpezeshki / CTC-Connectionist-Temporal-Classification

Theano implementation of CTC.
Apache License 2.0
74 stars 26 forks source link

p(blank symbol) >> p(non-blank symbol) during NN-CTC training #4

Open lifelongeek opened 9 years ago

lifelongeek commented 9 years ago

Hi all

I want to discuss some issue regarding training DNN/CNN-CTC for speech recognition. (Wall Street Journal Corpus). I modeled output unit as characters.

I observed that CTC objective function was increasing and finally converged during training. image

But I also observed that final NN outputs have clear tendency : p(blank symbol) >> p(non-blank symbol) for all speech time frame as following figure

image

In Alex Graves' paper, trained RNN should have high p(non-blank) at some point like following figure image

Do you have same situation when you train NN-CTC for sequence labeling problem? I am suspecting that the reason is I use MLP/CNN instead of RNN, but I can't clearly explain why this can be a reason.
Any idea about this result?

Thank you for reading my question.

tbluche commented 8 years ago

Hi, I have quite the same experience with handwriting recognition. I did some exploration of CTC training with different NNs during my PhD and the results are the following: