ubclaunchpad / hermes

👟 On the way to captioning dialogs
2 stars 1 forks source link

Research #1

Open grig-guz opened 6 years ago

grig-guz commented 6 years ago

Research :man_scientist:

Before starting, we need to take a look at what people already did. Possible sources:

  1. Medium
  2. Online tutorials
  3. arXiv (if you dare)

Task

Find at least 3 approaches, and discuss/evaluate them in terms of accuracy/complexity/need for data

AhmedAbdelmoneim commented 6 years ago

I can do this one

grig-guz commented 6 years ago

@AhmedAbdelmoneim make sure to assign yourself when you choose a task

AhmedAbdelmoneim commented 6 years ago

Here are some papers I found:

https://www.isca-speech.org/archive/archive_papers/interspeech_2011/i11_0437.pdf - used CD-DNN-HMMs, achieved a WER (word error rate) of 18.5%

http://proceedings.mlr.press/v32/graves14.pdf - used LSTM RNN, achieved a bunch of WERs ranging from 6.7% to 27.3%

https://www.isca-speech.org/archive/archive_papers/interspeech_2010/i10_1045.pdf - used RNN Language Model, achieved a WER of 9.5%

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7472621&tag=1 - I don't understand the architecture completely, but basically divided it into two components, a listener and a speller, achieved a WER of 14.1%. Also cool because they have a github repo with most of their work. Link to repo: https://github.com/v0lta/Listen-attend-and-spell

Deep Speech paper: https://arxiv.org/pdf/1412.5567.pdf Repo: https://github.com/mozilla/DeepSpeech