mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.43k stars 3.98k forks source link

Keyword detection from WAV file #1314

Open lsahlstr opened 6 years ago

lsahlstr commented 6 years ago

We would like a keyword detection enhancement to DeepSpeech, i.e, the ability to detect a key word or phrase directly from a WAV audio file. We saw "keyword spotting" in the Meeting Notes as a potential future ask, so maybe it is an enhancement on the near horizon?

We are looking for keyword search similar to Kaldi ( or CMU Sphinx (

chesterkuo commented 6 years ago

Check this to see if it helps.

I had created a CNN network for spot word detection, inference time for wave file is ~ 70ms, and model file is 5MB(it can be small , but accuracy may lower)

1337sup3rh4x0r commented 6 years ago

Snowboy ( could be used to trigger recording of the actual wav that is then transcribed by DeepSpeech

JRMeyer commented 4 years ago

a possible implementation:

"Unrestricted Vocabulary Keyword Spotting using LSTM-CTC"