p(blank symbol) >> p(non-blank symbol) during NN-CTC training

mpezeshki / CTC-Connectionist-Temporal-Classification

Theano implementation of CTC.

Apache License 2.0

74 stars 26 forks source link

Hi all

I want to discuss some issue regarding training DNN/CNN-CTC for speech recognition. (Wall Street Journal Corpus). I modeled output unit as characters.

I observed that CTC objective function was increasing and finally converged during training.

But I also observed that final NN outputs have clear tendency : p(blank symbol) >> p(non-blank symbol) for all speech time frame as following figure

In Alex Graves' paper, trained RNN should have high p(non-blank) at some point like following figure

Do you have same situation when you train NN-CTC for sequence labeling problem? I am suspecting that the reason is I use MLP/CNN instead of RNN, but I can't clearly explain why this can be a reason.
Any idea about this result?

Thank you for reading my question.

mpezeshki / CTC-Connectionist-Temporal-Classification

p(blank symbol) >> p(non-blank symbol) during NN-CTC training #4