synesthesiam / voice2json

Command-line tools for speech and intent recognition on Linux
MIT License
1.08k stars 63 forks source link

Question: configure voice detection delay/debounce for `transcribe-stream`? #61

Open hiinaspace opened 2 years ago

hiinaspace commented 2 years ago

Hi, I'm using voice2json transcribe-stream with short one-word commands (to control a multirotor drone, e.g. "left" "right" "up"). Ideally I'd like the detector to respond as soon as possible after a word, but currently voice2json seems to wait a minimum of 2 seconds after it detects a voice to pass the audio into the transcriber, given by the 'end time' of the tokens object. Furthermore, if there's significant background noise (say, a buzzing quadcopter), voice2json continues to record for up to 15 seconds before passing back the audio for transcription and emitting the json line.

Is there any way to configure the min/max delay for commands? I tried the --timeout option, but even with --timeout 0 the latency from utterance to json line seems the same.