mcf06 / theano_ctc

Theano bindings for Baidu's CTC library.
BSD 3-Clause "New" or "Revised" License
20 stars 5 forks source link

CTC loss for test_keras.py throws error when Masking layer is used #13

Open chandraprakash5 opened 8 years ago

chandraprakash5 commented 8 years ago

When a masking layer is used for speech utterances of variable length, an input dimension mis-match error is thrown. The following is the edited model from the test_keras.py to reproduce the error.

model = Sequential() model.add(Masking(mask_value=0., input_shape=(frame_len, nb_feat))) model.add(LSTM(inner_dim, return_sequences = True)) model.add(BatchNormalization()) model.add(TimeDistributed(Dense(nb_output)))

ValueError: GpuElemwise. Input dimension mis-match. Input 1 (indices start at 0) has shape[1] == 80, but the output's size on that axis is 16.

Please suggest how can a Masking layer be used when using CTC loss with Keras.

githubnemo commented 8 years ago

I think the problem is that internally Keras tries to apply the mask to the loss which has no time dimension anymore (since CTC returns a shape of (batch, 1)). You could do a layer at the end of your network that removes the mask (a layer that returns None in compute_mask).

Alternatively, implement the CTC loss as a layer and use the mask to compute the activation sequence lengths while ignoring the mask as described above.