Long time (over 300 frames) problem in speech recognition (in TIMIT data)

rakeshvar / rnn_ctc

Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.

Apache License 2.0

220 stars 80 forks source link

Long time (over 300 frames) problem in speech recognition (in TIMIT data) #2

Closed kimkwangho82 closed 9 years ago

kimkwangho82 commented 9 years ago

Hello!

We have a long time (over 300 frames) problem in speech recognition (in TIMIT data).

In general, speech recognition used a feature data with long time, for example 300 frames for 3 second utterance. When we analyzed your code in 'ctc.py' scan function, it seems to be calculated as zero in probabilities variable for over 300 frames. And 'cost' variable showed as 'Inf'.

How can we treat the problem? Do you have any suggestions?

I will wait your comments.

Best regards.

rakeshvar commented 9 years ago

Thanks for bringing it to my notice. Unfortunately my code does not work for longer sequences. You will need to implement things in the log space. Which is slower. Take a look at this. I think that code works for TIMIT. In the meanwhile I will try to add log ctc.

rakeshvar commented 9 years ago

I added support for log-space. Please see if you have any success with that feature.

madhavsund commented 8 years ago

When i trained digits i got better results. when i used sentences with more number of frames , i get worst result. the longer sequences are not even aligned to any label.

I think problem still exists...

note: i used the latest code in trunk with logspace.