Closed eurismarpires closed 1 year ago
Hi! This is a project about voice recognition. The goal is a binary classification, the purpose is only to identify whether the correct sound, not a formal speech recognition.
1.What does that mean?
The value of the output after the LSTM training after the forecast results, as long as higher than a threshold that is identified as "Zhīma kāimén", if not then it is not the correct answer.
2.What is the language of the wav files and what is being told?
The language of Wav file is Chinese "芝麻開門", the mean of content is "Open Sesame!", Pronounced "Zhīma kāimén", then the other are wrong sounds
Because the next can not think how to extend, this project has not yet completed.
Thanks for replying, I am trying to develop a voice recognition system and your project is being useful for me to understand how to map MFCC to an LSTM network
I ran the code and got the following output:
for ans in output: print (ans)
[-0.11548512] [-0.19542785] [-0.28934821] [ 0.03601332] [ 0.16698757] [ 0.67434764] [-0.42637679]
what does that mean?
What is the language of the wav files and what is being said?