Closed Victor-Almeida closed 4 years ago
@Victor-Almeida
In order to expedite the trouble-shooting process, please provide colab link or minimal standalone code to reproduce the issue reported here. It helps us in localizing the issue faster.Thanks!
I did.
The gist --> https://gist.github.com/Victor-Almeida/df1d0dc2cea318216d320d029dc8e64f The Google Drive folder --> https://drive.google.com/open?id=1bgGte_wVyaYAycBntQA8uQWmVQZqhPYH
I have tried on colab with TF version 2.2-rc3 .Please, find the gist here.Is this the expected behavior?Thanks!
Yes, that's what's happening. Predictions are all blank, and you can see that during training because the LER is always 1.
Modeling questions are a better fit for StackOverflow, instead of github issues. However, from experience the typical explanation here is that the model is in an intermediate stage of training. The typical progression with CTC is first it learns to emit just blanks; then it starts learning the outer edges of tokens to emit, then after more epochs learns to emit the intermediate tokens. This assumes you have enough model capacity and the architecture of the underlying RNN has enough capacity to do so.
To summarize: probably you haven't run training for long enough, or your model capacity is too low, or your optimizer isn't tuned well.
I'll close this for now; probably you will want to follow the convergence question up on StackOverflow, reddit, or similar.
Hello.
I'm trying to use Tensorflow's
tf.nn.ctc_loss
for a speech recognition problem, but it seems it's causing the network to learn that the best way to reduce loss is to output blank. I've tried other implementations, like this and this, but they have the same problem.Here is the gist for my own implementation and here is the link to my Google Drive folder with the files used.
I'm using Google Colab's high-RAM runtime with GPU and Tensorflow version 2.2.0-rc3.
Also, for some reason I get this error
ValueError: Dimension must be 2 but is 3 for '{{node transpose}} = Transpose[T=DT_FLOAT, Tperm=DT_INT32](model_52/Placeholder, transpose/perm)' with input shapes: [1200,29], [3].
when trying to usetf.function
on thetrain_step
method from theCTC_SR
class when using theEncoder_Decoder
class, but not when using the actual Keras' layers. When usingtf.function
with Keras layers, though, training takes waaaaaay longer. Why is that?