mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.02k stars 3.94k forks source link

Keyword detection from WAV file #1314

Open lsahlstr opened 6 years ago

lsahlstr commented 6 years ago

We would like a keyword detection enhancement to DeepSpeech, i.e, the ability to detect a key word or phrase directly from a WAV audio file. We saw "keyword spotting" in the Meeting Notes as a potential future ask, so maybe it is an enhancement on the near horizon?

We are looking for keyword search similar to Kaldi (http://kaldi-asr.org/doc/kws.html) or CMU Sphinx (https://sourceforge.net/p/cmusphinx/discussion/help/thread/9234e9d4/).

chesterkuo commented 6 years ago

Check this to see if it helps.

I had created a CNN network for spot word detection, inference time for wave file is ~ 70ms, and model file is 5MB(it can be small , but accuracy may lower)

https://github.com/chesterkuo/kaggle-speech-challenge-1

1337sup3rh4x0r commented 6 years ago

Snowboy (https://github.com/kitt-ai/snowboy) could be used to trigger recording of the actual wav that is then transcribed by DeepSpeech

JRMeyer commented 4 years ago

a possible implementation:

"Unrestricted Vocabulary Keyword Spotting using LSTM-CTC"

https://www.isca-speech.org/archive/Interspeech_2016/pdfs/0753.PDF