synesthesiam / voice2json

Command-line tools for speech and intent recognition on Linux
MIT License
1.08k stars 63 forks source link

Using wake-word and transcribe intent at same time #64

Open ramainen opened 2 years ago

ramainen commented 2 years ago

As I did understand, there are two modes: wait-wake and dealing with speech recognition (transcribe).

Both works for me, but how to run both modes at once? I mean, system must wait wake word, and after than starting transcribing stream for intent command. Is there recipe for this? In ideal way, there is json in stdout, new line - new json, appearing when I say wake word AND intent after that. Super ideally: first JSON when wake word detected (so I will light up indicator LED and say "what do you want?" to speaker), and after that JSON when intent recognized.

Thank you after all for this cool project, runs brilliant at Jetson Nano 2Gb.

ramainen commented 2 years ago

I know about bash recipe https://github.com/synesthesiam/voice2json/blob/master/recipes/launch_program/listen_and_launch.sh

But I believe time after voice2json wake word starting and actually starting recognition takes around 5-10 seconds. So, after executing intent it will be in not working state pretty long time. Or I understand wrong? Is this actual recipe or proof-of-work?

ramainen commented 2 years ago

image

So, 13 seconds to actually begin wake-word recognition, of course, it will detect, if word been sayed at this 13 seconds because linux pipe cache, but:

echo 'Waiting for wake word...'
voice2json wait-wake --exit-count 1

# Play a sound to tell the user we're recording
aplay "${this_dir}/beep_hi.wav"

# Record voice command until silence
echo 'Recording voice command...'

between "Waiting for wake word..." and "Recording voice command..." time 13 seconds, if I say next command right after previous.

ramainen commented 2 years ago

OK, I made it works, for chaingin audio device for simultanius access, sending wait-wake (without exit-count) to stdin of another program, and this program calls writing audio to 3 seconds wav file, and doint intent stuff. wake-word still running in this time