Beam_search_decoder error of "list index out of range"

vrenkens / nabu

Code for end-to-end ASR with neural networks, build with TensorFlow

MIT License

108 stars 43 forks source link

Beam_search_decoder error of "list index out of range" #42

Closed AzizCode92 closed 6 years ago

AzizCode92 commented 6 years ago

I finished training my model. Tested it with no error but when I tried the decoder part I got an error of " list index out of range" caused by the beam_search_decoder.py in line 131 text = ' '.join([self.alphabet[s] for s in sequence ]) The alphabet I used for the decoder is :
\<space> a b c d e f g h i j k l m n o p q r s t u v w x y z \<unk>

The error here is caused because the length of my alphabet is 28 and the sequence has a length of 200 so knowing that the two lists have no equal length, the error will be thrown.

vrenkens commented 6 years ago

Hi, which recipe are you using and and have you pulled the latest version? The error is not caused by the difference in length, but because of the model output dimension is different from the alphabet size.

AzizCode92 commented 6 years ago

Yes, it was the last version of Nabu. I'm using the recipe LAS for the librispeech.

AzizCode92 commented 6 years ago

Correct, looking back to the dim_output in my model I found it is 39 and the alphabet is 28 including the \<space> and the \<unk>. I think I have to retrain the model again? What would be the dim outpout of the model in this case please? 26 or 28 ?

vrenkens commented 6 years ago

I am afraid so, yes

AzizCode92 commented 6 years ago

Have you worked with the librispeech dataset before, in this case what would be the alphabet? How do you chose this parameter. thank you.

vrenkens commented 6 years ago

I have not. It depends what you want to do in terms of special characters and punctuation. You have to normalize your tekst using the normalizers such that the tekst only contains characters from your alphabet

AzizCode92 commented 6 years ago

Thank you, I fixed the problem and I have the results on the Librispeech dataset with a WER = 4.81 Planning to prepare soon for a pull request to your work. For those who are reading my comment, please consider that the dim_output of your model must be equal to the alphabet size. Best, Aziz