section-engineering-education / engineering-education

“Section's Engineering Education (EngEd) Program is dedicated to offering a unique quality community experience for computer science university students."
Apache License 2.0
363 stars 890 forks source link

end-to-end speech recognition with recurrent neural networks #4075

Closed extravaganza77 closed 2 years ago

extravaganza77 commented 3 years ago

Topic Suggestion

End-to-end speech recognition with deep Recurrent Neural Networks(RNNs)

Pre-submission advice

By following all our pre-submission advice and reviewing our Resources folder, you will maximise your chances of your topic being approved.

We ask that you please be patient as our team works through approving and publishing all articles/tutorials in a timely manner.

Allow 1-3 days for a topic to be reviewed and/or approved - allow 3-7 days for an articles to be reviewed and/or published.

Be sure to visit our Resources Page for tools, resources, and example articles that will help you propose and write a successful article.

Please ensure that you have only one open issue + linked pull request at a time. This will ensure that we complete the article in a timely manner from inception to publishing.)

We tend to stray away or tend not to publish reviews/comparisons of commercial product offerings.

Proposal Submission

[Machine Learning]End-to-end speech recognition with deep Recurrent Neural Networks(RNNs)

Proposed article introduction

Neural networks have a long history in speech recognition, usually in combination with hidden Markov models . They have gained attention in recent years with the dramatic improvements in acoustic modelling yielded by deep feedforward networks . Given that speech is an inherently dynamic process, it seems natural to consider recurrent neural networks (RNNs) as an alternative model. HMM-RNN systems have also seen a recent revival , but do not currently perform as well as deep networks.

Key takeaways

  1. Recurrent Neutral networks
  2. RNN Transducer
  3. Connectionist Temporal Classification
  4. Decoding a CTC network
  5. regulating RNN

    Article quality

    This article will present a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function. My article will also cover in-depth Deep learning frameworks that would be alternative for RNN.

References

N/A

Conclusion

The reader would be shown that the combination of deep, bidirectional Long Short-term Memory RNNs with end-to-end training and weight noise gives state-of-the-art results in phoneme recognition on the TIMIT database. An obvious next step is to extend the system to large vocabulary speech recognition. Another interesting direction would be to combine frequency-domain convolutional neural networks

Templates to use as guides

ahmadmardeni1 commented 3 years ago

Good afternoon and thank you for submitting your topic suggestion. Your topic form has been entered into our queue and should be reviewed (for approval) as soon as a content moderator is finished reviewing the ones in the queue before it.

lalith1403 commented 3 years ago

Great topic 🚀 , make sure it matches the following:

Please reference any relevant EngEd articles in yours and build a unique project - Approved.