Blank outputs when using CTC loss on TensorFlow 2

tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

https://tensorflow.org

Apache License 2.0

186.17k stars 74.28k forks source link

Blank outputs when using CTC loss on TensorFlow 2 #38777

Closed Victor-Almeida closed 4 years ago

Victor-Almeida commented 4 years ago

Hello.

I'm trying to use Tensorflow's tf.nn.ctc_loss for a speech recognition problem, but it seems it's causing the network to learn that the best way to reduce loss is to output blank. I've tried other implementations, like this and this, but they have the same problem.

Here is the gist for my own implementation and here is the link to my Google Drive folder with the files used.

I'm using Google Colab's high-RAM runtime with GPU and Tensorflow version 2.2.0-rc3.

Also, for some reason I get this error ValueError: Dimension must be 2 but is 3 for '{{node transpose}} = Transpose[T=DT_FLOAT, Tperm=DT_INT32](model_52/Placeholder, transpose/perm)' with input shapes: [1200,29], [3]. when trying to use tf.function on the train_step method from the CTC_SR class when using the Encoder_Decoder class, but not when using the actual Keras' layers. When using tf.function with Keras layers, though, training takes waaaaaay longer. Why is that?

ravikyram commented 4 years ago

@Victor-Almeida

In order to expedite the trouble-shooting process, please provide colab link or minimal standalone code to reproduce the issue reported here. It helps us in localizing the issue faster.Thanks!

Victor-Almeida commented 4 years ago

I did.

The gist --> https://gist.github.com/Victor-Almeida/df1d0dc2cea318216d320d029dc8e64f The Google Drive folder --> https://drive.google.com/open?id=1bgGte_wVyaYAycBntQA8uQWmVQZqhPYH

ravikyram commented 4 years ago

I have tried on colab with TF version 2.2-rc3 .Please, find the gist here.Is this the expected behavior?Thanks!

Victor-Almeida commented 4 years ago

Yes, that's what's happening. Predictions are all blank, and you can see that during training because the LER is always 1.

ebrevdo commented 4 years ago

Modeling questions are a better fit for StackOverflow, instead of github issues. However, from experience the typical explanation here is that the model is in an intermediate stage of training. The typical progression with CTC is first it learns to emit just blanks; then it starts learning the outer edges of tokens to emit, then after more epochs learns to emit the intermediate tokens. This assumes you have enough model capacity and the architecture of the underlying RNN has enough capacity to do so.

To summarize: probably you haven't run training for long enough, or your model capacity is too low, or your optimizer isn't tuned well.

I'll close this for now; probably you will want to follow the convergence question up on StackOverflow, reddit, or similar.

google-ml-butler[bot] commented 4 years ago

Are you satisfied with the resolution of your issue? Yes No