rakeshvar / rnn_ctc

Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.
Apache License 2.0
220 stars 80 forks source link

speech recognition #5

Closed madhavsund closed 8 years ago

madhavsund commented 8 years ago

whether rnn_ctc can be used for speech recognition

rakeshvar commented 8 years ago

yes. Read the reference book.

madhavsund commented 8 years ago

thanks a lot . I tried with digit speech corpus. My architecture is input dimension: 40 output : 25 (BDLSTM, {"nunits": 30}), epoch :100 number of samples :3700

but only silence is fired always...I tried with recurrent model, the same problem. what could be the reason ...

rakeshvar commented 8 years ago

I can look at it after a month. You can leave me with the details to the data etc.

madhavsund commented 8 years ago

Ok.I will send the data. When I tried with Handwritten digit corpus i get good accuracy. so my doubt is whether the input data range should be between 0 to 1 for speech corpus too

rakeshvar commented 8 years ago

Which hand written digit-corpus did you try on? Can you send the details of that too?

madhavsund commented 8 years ago

I tried on Mnist database with sequence input. After normalizing the speech corpus. it is working fine. Thanks a lot....

rakeshvar commented 8 years ago

Good to know that. Can you give a link to the data here? Thanks.

rakeshvar commented 8 years ago

Here is a hand-written sentences dataset.

madhavsund commented 8 years ago

thanks for pointing out...

I trained speech

TEST

Shown : [8, 22, 13, 20] 8 22 13 20 Seen : [8, 22, 13, 20] 8 22 13 20

SoftMax Firings: 0¦ ¦ 1¦ ¦ 2¦ ¦ 3¦ ¦ 4¦ ¦ 5¦ ¦ 6¦ ¦ 7¦ ¦ 8¦█ ¦ 9¦ ¦ 10¦ ¦ 11¦ ¦ 12¦ ¦ 13¦ █ ¦ 14¦ ¦ 15¦ ¦ 16¦ ¦ 17¦ ¦ 18¦ ¦ 19¦ ¦ 20¦ █ ¦ 21¦ ¦ 22¦ █ ¦ 23¦ ¦ 24¦ ¦ 25¦ ███████████████████████████████████████████████████████████████¦

the output is correct but It is not firing with respect to time. it outs the label in the starting itself whether it is correct ??? I will share the data once it is fixed.

On Thu, Oct 29, 2015 at 10:26 AM, rakeshvar notifications@github.com wrote:

Here is a hand-written sentences dataset http://www.iam.unibe.ch/fki/databases/iam-handwriting-database.

— Reply to this email directly or view it on GitHub https://github.com/rakeshvar/rnn_ctc/issues/5#issuecomment-152075905.

rakeshvar commented 8 years ago

@madhavsund It can not be so. There is something wrong with the rendering/copy-paste/print mechanism. A lot of spaces do not render well. You need to use ``` quoted string while pasting on github.

Another possible explanation is that you are using a BDRNN.

madhavsund commented 8 years ago

yes i used BDLSTM . I have attached the document containing output.

screenshot from 2015-11-02 10 24 09

rakeshvar commented 8 years ago

Bidirectional will do that. I am happy to know that BDLSTM is working so well. If you do not want such behaviour you can try just LSTM.

madhavsund commented 8 years ago

previously data was normalized between 0 to 1 but when i used the same network (BDLSTM) and normalized the data using Zscore( zero mean and unit variance) I get output with respect to time...I am confused which one is correct

screenshot from 2015-11-02 11 05 53

madhavsund commented 8 years ago

kindly guide me how could i save the network , so that i can test the new images / test images in live mode

rakeshvar commented 8 years ago

Can you give me a link to the data (my gmail is same as my github id), so that I can replicate the problem on my computer. I will get back to you when I know what exactly is happening.