Open chandraprakash5 opened 8 years ago
I think the problem is that internally Keras tries to apply the mask to the loss which has no time dimension anymore (since CTC returns a shape of (batch, 1)
). You could do a layer at the end of your network that removes the mask (a layer that returns None in compute_mask
).
Alternatively, implement the CTC loss as a layer and use the mask to compute the activation sequence lengths while ignoring the mask as described above.
When a masking layer is used for speech utterances of variable length, an input dimension mis-match error is thrown. The following is the edited model from the test_keras.py to reproduce the error.
model = Sequential() model.add(Masking(mask_value=0., input_shape=(frame_len, nb_feat))) model.add(LSTM(inner_dim, return_sequences = True)) model.add(BatchNormalization()) model.add(TimeDistributed(Dense(nb_output)))
ValueError: GpuElemwise. Input dimension mis-match. Input 1 (indices start at 0) has shape[1] == 80, but the output's size on that axis is 16.
Please suggest how can a Masking layer be used when using CTC loss with Keras.