Open Patronics opened 4 years ago
In this scenario you can use the "hotword" event instead. When a hotword is detected, this library emits "hotword" instead of "recognize". I didn't fully flesh out the hotword/command system, but the things that happen after a command is quite different than what happens if you have a continuously listening interactive loop so I wanted these to be separate events.
deepspeech.on('hotword', function(text, stats) {
console.log('you said:', text);
});
In the jaxcore-bumblebee
API, this event is picked up and calls the "onCommand" method of the Bumblebee.Assistant -- it's not available in the application and not available in .loop()
. Basically this is the difference between an app like "Hello World', and an assistant like "Bumblebee" -- the assistant listens to hotwords, and acts as the controller for the applications.
So what you'd like to do is create an assistant. And there are some examples here:
https://github.com/jaxcore/bumblebee/blob/master/examples/nodejs-examples/assistants/
Take a look at the terminator assistant in particular, it has onCommand method to pick up the Deepspeech results:
You might have to clean up the text that gets returned, because the hotword itself might be returned in the recognition text.
Thanks for the reply, and I was initially looking into implementing an assistant as you suggest, but I didn't see any way to use one without the whole fancy electron server application you made. It's definitely very cool, but overkill for my system where I already have an electron-based UI, and just want voice control to switch between pages of the user interface and possibly answer a few questions from users. Can the assistants be used directly with this library, or with some other more minimal form of the server?
Also I tried using the deepspeech.on('hotword') example you gave me, but it only ever seemed to output the hotword itself, and not the words following it.(or would you suggest configuring a hotword for each keyword I need to listen for?)
Thanks for your help!
Oh okay.
There's a little more you'd have to know to get that to work. Basically your app would need to partially replicate some of the things that are going on inside the bumblebee electron app.
This is the hotword detection library I wrote to handle Porcupine, this is the library in my electron app that's handling the microphone data and is what first receives the hotword detection:
https://github.com/jaxcore/bumblebee-hotword
The bumblebee-deepspeech
library is only for DeepSpeech handling. But just as you, I also needed these libraries to work together so I could process speech that is spoken before or after saying the hotword ("eg. what is the time, Bumblebee").
The process flow looks something like this:
bumblebee-hotword -> detects "Hello Bumblebee" -> send audio +hotword via Electron IPC -> bumblebee-deepspeech -> DeepSpeech -> emit hotword event for "Hello Bumblebee"
Ok, thanks for the additional information, I'll continue digging into the code you've provided, see if I can figure something out. Thanks again for your help!
Before going too far with building your own system, I'd probably try to rely on Bumblebee being installed and running on your system, and then copy the code from one of the example assistants into your app and see if it works. You just wouldn't be able to use "bumblebee" as the hotword, you'd have to use one of the other ones (grasshopper, hey edison, porcupine, terminator, white smoke, or blueberry).
Also I tried using the deepspeech.on('hotword') example you gave me, but it only ever seemed to output the hotword itself, and not the words following it.(or would you suggest configuring a hotword for each keyword I need to listen for?)
I forgot to reply to this. It does work. The example here would need to be slightly modified to get the recognition results returned in the deepspeech "hotword" event:
deepspeech.on('recognize', (text, stats) => {
console.log('Speech Recognition Result:', text);
});
bumblebee.on('hotword', function(hotword) {
// if (speechRecognitionActive) {
// console.log('\nSPEECH RECOGNITION OFF');
// console.log('\nStart speech recognition by saying:', 'BUMBLEBEE');
// playSoundFile(__dirname + '/bumblebee-off.wav');
// speechRecognitionActive = false;
// }
// else if (!speechRecognitionActive) {
// console.log('\nSPEECH RECOGNITION ON');
// console.log('Stop speech recognition by saying:', 'BUMBLEBEE');
// playSoundFile(__dirname + '/bumblebee-on.wav');
// speechRecognitionActive = true;
// }
// deepspeech.streamReset();
});
deepspeech.on('hotword', function (text, stats) {
console.log('hotword command:', text, stats);
});
bumblebee.on('data', function (intData, sampleRate, hotword, float32arr) {
// if (speechRecognitionActive) {
deepspeech.streamData(intData, sampleRate, hotword, float32arr);
// }
});
I'm working on some code that's closely based on the hotword example, with the only differences (so far) being it doesn't play the sound effect after hearing "bumblebee", and that there's a few if statements checking for particular words within the deepspeech.on('recognize') section.
Anyway, the issue is that it seems the system doesn't report whatever is said first after the hotword to the deepspeech.on code. (your hotword example cleverly (or accidentally) hides this by playing the sound effect, making that the first thing it hears).
Clearly the information exists at some point during the process, as when debugging is on (as with the microphone example), it does report "Recognized Text: " with the wakeword and whatever immediately follows, apparently from here. However, this text never reaches the deepSpeech.on('recognize') function, and I can't find any other obvious way to access it from the example programs, aside from modifying the library's source code.
Also interesting is that this isn't just for words immediately follows the wake word (with no delay), but whatever is said following it, regardless of the time delay. For example, using the hotword example with the playsoundfile commented out, saying "bumblebee." and waiting 10 seconds (or any amount of time), the next thing you say won't be acknowledged by the system.