johanneskropf / node-red-contrib-voice2json

Node-RED nodes for local speech and intent recognition via voice2json
Apache License 2.0
28 stars 11 forks source link

node-red-contrib-voice2json

!!!This is very much wip, so please use carefully. The nodes will only work with the latest version of voice2json and are not backwards compatible with voice2json 1.0.0!!!

Node-RED nodes that provide a simple wrapper for local speech and intent recognition on linux via voice2json.

The voice2json project offers a collection of command line speech and intent recognition tools on Linux or in a Docker container.

Thanks to Bart Butenaers, my partner in crime for this node! He came up with the crazy idea that I should get involved in this business of node-red node development, and without his knowledge and his huge contribution to this node it wouldn't be here today.

:warning: Have a look at the step by step tutorial on our wiki page to get started with these nodes!

Install

Run the following npm command in your Node-RED user directory (typically ~/.node-red):

npm install johanneskropf/node-red-contrib-voice2json

Voice2Json installation

Install voice2json on the same machine as nodered. Detailed instructions can be found in the voice2json documentation, too install Voice2Json in one of the following ways:

  1. As a (pre-compiled) Debian package.
  2. As a Docker container.

Language profile installation

To be able to start voice recognition, a language profile needs to be installed. Download the profile of your preferred language, from the list of supported languages. The directory - where the language profile is stored - needs to be entered in the config node screen (see further).

Remark: When using the Voice2Json Docker container, make sure the language profile is stored somewhere in your home directory. Otherwise Voice2Json will not be able to access it from its Docker container. If that is not possible, you will need to make the path accessible for the Docker container, by adding an additional -v argument in the above Voice2Json Docker run bash script.

Node Usage

This suite offers 5 Node-RED nodes in the Node-RED palette, located in the "Voice2Json" section:

Palette

Those nodes can be combined to create a complete local voice setup:

Overview

Note that all the example flows from this page can easily be installed via the Node-RED "Import" menu:

Import menu

As a prerequisite for the example flows to work please install voice2json and download the en-us kaldi profile. Once you have everything downloaded and set up import the example flow and go into the config and change the profile path to your download location of the profile.

Nodes

Config node

Create a config node for each installed voice2json language profile. In most cases a single language profile will be sufficient.

The config node contains the following information:

File Handling & Profile Sync

The config node allows you to edit the sentences file and the slot files located in each language profile directly from node-red. With this come some caveats. When you first make a config for a new profile path the node doesnt automatically load the sentences or slots that may already be present in a language profile directory but instead presents you with a clean slate to start with. Should you wish to import the sentences or slots already present in the profile folder to use them we provide seperate load from profile buttons for the sentences tab and the slots tab. Reasons for this could be to see the example sentences that come with a profile or to import the sentences or/and slots from an exeisting profile that you made on another machine and imported here. This is also handy to get your profile back into sync should you have made manual external changes to the sentences or slots that you otherwise manage from nodered. If you dont load the sentences/slots from the profile and there are already pre existent ones or you made external changes the node will on deploy overwrite the sentences with the nodered version! For slot files there are a couple of options:

Please make sure to create frequent backups of the relevant files in the profile folders and esspecially when importing an existant voice2json profile for use with nodered!

Training node

The training node enables the training of a profile from node-red.

To start training select the profile to train from the nodes config and than after deploying send a msg.payload = "train" via the input message:

Training flow

[{"id":"307ba520.0db2fa","type":"voice2json-training","z":"11289790.c89848","name":"","voice2JsonConfig":"3cf7b405.ee3c5c","inputField":"payload","outputField":"payload","loadedProfile":"","x":410,"y":320,"wires":[["3762bcf3.2585c4"]]},{"id":"6aaceed9.49082","type":"inject","z":"11289790.c89848","name":"Start training","topic":"","payload":"train","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":190,"y":320,"wires":[["307ba520.0db2fa"]]},{"id":"3762bcf3.2585c4","type":"debug","z":"11289790.c89848","name":"Training result","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":640,"y":320,"wires":[]},{"id":"3cf7b405.ee3c5c","type":"voice2json-config","z":"","profilePath":"/home/pi/voice2json_profile/en-us_kaldi-zamia-2.0","name":"Kaldi english profile","sentences":"[GetTime]\nwhat time is it\ntell me the time\n\n[GetTemperature]\nwhats the temperature\nhow (hot | cold) is it\n\n[GetGarageState]\nis the garage door (open | closed)\n\n[ChangeLightState]\nlight_name = ((living room lamp | garage light) {name}) | <ChangeLightColor.light_name>\nlight_state = (on | off) {state}\n\nturn <light_state> [the] <light_name>\nturn [the] <light_name> <light_state>\n\n[ChangeLightColor]\nlight_name = (bedroom light) {name}\ncolor = (red | green | blue) {color}\n\nset [the] <light_name> [to] <color>\nmake [the] <light_name> <color>","slots":[{"fileName":"slot1","managedBy":"external","fileContent":null,"executable":false},{"fileName":"fold_a/fold_b/fold_c/testslot","managedBy":"external","fileContent":null,"executable":false},{"fileName":"rhasspy/number","managedBy":"external","fileContent":null,"executable":true}],"removeSlots":true}]

An output message will be sent, containing the training commandline output lines:

result:
/usr/lib/voice2json/lib/kaldi/egs/wsj/s5/utils/prepare_lang.sh /home/pi/de_kaldi-zamia-2.0/acoustic_model/data/local/dict  /home/pi/de_kaldi-zamia-2.0/acoustic_model/data/local/lang /home/pi/de_kaldi-zamia-2.0/acoustic_model/data/lang
Checking /home/pi/de_kaldi-zamia-2.0/acoustic_model/data/local/dict/silence_phones.txt ...
--> reading /home/pi/de_kaldi-zamia-2.0/acoustic_model/data/local/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> /home/pi/de_kaldi-zamia-2.0/acoustic_model/data/local/dict/silence_phones.txt is OK
...

Since the output is a big blob of text (instead of json), the Node-RED debug panel will not show the entire output. Best way to see the whole training result is to write the training node output to a file, by using the File-Out node...

Wait Wake node

A node to listen to a stream of raw audio buffers and detect a wake-word in that stream. When a wake word is detected:

  1. It sends an output message on the first output, including the detected wake word and the time of detection relative to the nodes start and a unix timestamp to the first output.
  2. If the Forward audio to 2nd output on detection option is checked, the node will start ignoring any detected wake words after a detection and start forwarding the raw audio chunks to its second output. The forwarding continues until an input message is injected, containing listen in the configured control property. Then it will stop forwarding and start listening for a wake word again.

The following figure explains how the wake-word will open the gate (thus forwarding the stream to the second output), and how the listen command will close the gate again:

Wake word

The second output can be directly connected to Record-Command node, to record a command after a wake word was detected (when in forward mode). The wait-wake node will act as a kind of gate for the Record-Command node this way as for it to only start recording when a wake word was detected. This way it can be avoided that the Record-Command node has to process all conversations, which would be a waste of resources and could lead to unpredicatable results.

The Wake-Word listening process can be stopped at anytime, by injecting an input message containing stop in the configured control property. Note that the wait wake node will automatically start up again after a timeout of 2 seconds, if you dont stop the input audio stream when stopping this node. This way the stop command can be used to restart the node.

A possible source for the input stream of raw audio buffers is node-red-contrib-sox-record which should work out of the box with this node.

Here is a small examle flow using the node-red-contrib-sox-record node and a connected microphone to see the wait-wake node in action:

[{"id":"d70f5ed2.7088","type":"sox-record","z":"6417ff5b.9a455","name":"","buttonStart":"button","inputs":0,"inputSource":"1,0","byteOrder":"-L","encoding":"signed-integer","channels":1,"rate":16000,"bits":16,"gain":"0","lowpass":8000,"showDuration":false,"durationType":"forever","durationLength":0,"silenceDetection":"nothing","silenceDuration":"2.0","silenceThreshold":"2.0","outputFormat":"stream","manualPath":"color","debugOutput":false,"x":170,"y":1660,"wires":[["5b72f1e6.7d8e2"],[]]},{"id":"5b72f1e6.7d8e2","type":"voice2json-wait-wake","z":"6417ff5b.9a455","name":"","voice2JsonConfig":"a66d83bd.16a7d8","inputField":"payload","controlField":"control","outputField":"payload","nonContinousListen":true,"x":380,"y":1660,"wires":[["41502e87.5abfd","3a18ade7.ce6202"],["dc92e18d.01e8c"]]},{"id":"41502e87.5abfd","type":"debug","z":"6417ff5b.9a455","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":590,"y":1600,"wires":[]},{"id":"3a18ade7.ce6202","type":"trigger","z":"6417ff5b.9a455","op1":"","op2":"listen","op1type":"nul","op2type":"str","duration":"3","extend":false,"units":"s","reset":"","bytopic":"all","name":"3s than listen","x":590,"y":1660,"wires":[["5b72f1e6.7d8e2"]]},{"id":"dc92e18d.01e8c","type":"debug","z":"6417ff5b.9a455","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":590,"y":1720,"wires":[]},{"id":"c0769b02.c6d23","type":"inject","z":"6417ff5b.9a455","name":"stop (restart)","topic":"","payload":"stop","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":170,"y":1600,"wires":[["5a559572.e002fc"]]},{"id":"5a559572.e002fc","type":"change","z":"6417ff5b.9a455","name":"","rules":[{"t":"move","p":"payload","pt":"msg","to":"control","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":370,"y":1600,"wires":[["5b72f1e6.7d8e2"]]},{"id":"c3f82253.925cc","type":"comment","z":"6417ff5b.9a455","name":"wait-wake example","info":"Prerequisites for this example flow are that you must have a [node-red-sox-utils](https://github.com/johanneskropf/node-red-contrib-sox-utils) installed and a microphone connected to your raspberry or other device. Choose your input device in the mic nodes config and click the button to start recording. After a brief start up period the wait-wake node can be triggered by speaking the standard wake word of *hey mycroft* if no custom wake word has been configured in the selected profiles profile.yml. The wait-wake node will than forward the audio from the mic for three seconds on its second output and ignore wake words until told to listen again. You can restart (stop) the node by injecting start to the control topic.","x":410,"y":1540,"wires":[]},{"id":"a66d83bd.16a7d8","type":"voice2json-config","z":"","profilePath":"/home/pi/en-us_kaldi-zamia-2.0","name":"enUsKaldi","sentences":"[GetTime]\nwhat time is it\ntell me the time\n\n[GetTemperature]\nwhats the temperature\nhow (hot | cold) is it\n\n[GetGarageState]\nis the garage door (open | closed)\n\n[ChangeLightState]\nlight_name = ((living room lamp | garage light) {name}) | <ChangeLightColor.light_name>\nlight_state = (on | off) {state}\n\nturn <light_state> [the] <light_name>\nturn [the] <light_name> <light_state>\n\n[ChangeLightColor]\nlight_name = (bedroom light) {name}\n\nset [the] <light_name> [to] $color\nmake [the] <light_name> $color","slots":[{"fileName":"rhasspy/number","managedBy":"external","fileContent":null,"executable":true},{"fileName":"color","managedBy":"external","fileContent":null,"executable":false}],"removeSlots":true}]

Currently the default wakeword is "hey mycroft". If you want to setup a custom wake-word, you can find more information in the voice2json documentation.

Record Command node

A node to record a voice command from a stream of raw audio buffers. The record command node will:

As soon as its stops recording it will send a single buffer to the configured output, which is a wav audio object containing the chunks of the detected speech command:

Recording chunks

If the input audio stream is not stopped, it automatically will start recording a new command after a 2 second timeout.

The input of this can be directly connected to the second output of the wait wake node in forward mode or any other node that can send a stream of raw audio buffers in the correct format. The output wav buffer can be directly fed to the voice2json stt node input for transcription.

Here is a simple example flow:

[{"id":"184ac771.24eb91","type":"sox-record","z":"6417ff5b.9a455","name":"","buttonStart":"msg","inputs":1,"inputSource":"1,0","byteOrder":"-L","encoding":"signed-integer","channels":1,"rate":16000,"bits":16,"gain":"0","lowpass":8000,"showDuration":false,"durationType":"forever","durationLength":0,"silenceDetection":"nothing","silenceDuration":"2.0","silenceThreshold":"2.0","outputFormat":"stream","manualPath":"","debugOutput":false,"x":290,"y":1900,"wires":[["ae954e1b.8764f"],[]]},{"id":"b21a3465.f1e6d","type":"inject","z":"6417ff5b.9a455","name":"","topic":"","payload":"start","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":130,"y":1900,"wires":[["184ac771.24eb91"]]},{"id":"ea9f63d8.ea2ba8","type":"change","z":"6417ff5b.9a455","name":"stop","rules":[{"t":"set","p":"payload","pt":"msg","to":"stop","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":370,"y":1960,"wires":[["184ac771.24eb91"]]},{"id":"ae954e1b.8764f","type":"voice2json-record-command","z":"6417ff5b.9a455","name":"","voice2JsonConfig":"a66d83bd.16a7d8","inputField":"payload","outputField":"payload","x":530,"y":1900,"wires":[["ea9f63d8.ea2ba8","4d0e9d91.67e32c"]]},{"id":"4d0e9d91.67e32c","type":"debug","z":"6417ff5b.9a455","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":770,"y":1900,"wires":[]},{"id":"6550c0a.e92874","type":"comment","z":"6417ff5b.9a455","name":"record command example","info":"Prerequisites for this example flow are that you must have a [node-red-sox-utils](https://github.com/johanneskropf/node-red-contrib-sox-utils) installed and a microphone connected to your raspberry or other device. Choose your input device in the mic nodes config and inject start to start recording and say something.\nThe record-command node will now listen to the stream of buffers from the microphone and as soon as it detects silence it will emit a single wav buffer containing the spoken command. ","x":450,"y":1840,"wires":[]},{"id":"a66d83bd.16a7d8","type":"voice2json-config","z":"","profilePath":"/home/pi/en-us_kaldi-zamia-2.0","name":"enUsKaldi","sentences":"[GetTime]\nwhat time is it\ntell me the time\n\n[GetTemperature]\nwhats the temperature\nhow (hot | cold) is it\n\n[GetGarageState]\nis the garage door (open | closed)\n\n[ChangeLightState]\nlight_name = ((living room lamp | garage light) {name}) | <ChangeLightColor.light_name>\nlight_state = (on | off) {state}\n\nturn <light_state> [the] <light_name>\nturn [the] <light_name> <light_state>\n\n[ChangeLightColor]\nlight_name = (bedroom light) {name}\n\nset [the] <light_name> [to] $color\nmake [the] <light_name> $color","slots":[{"fileName":"rhasspy/number","managedBy":"external","fileContent":null,"executable":true},{"fileName":"color","managedBy":"external","fileContent":null,"executable":false}],"removeSlots":true}]

Speech To Text node

The speech to text node can be used to recognize sentences (which are specified in the selected config node).

STT flow

[{"id":"a130ba16.223568","type":"voice2json-stt","z":"11289790.c89848","name":"","voice2JsonConfig":"3cf7b405.ee3c5c","inputField":"payload","controlField":"control","outputField":"payload","autoStart":true,"x":660,"y":380,"wires":[["b1517366.97b42"]]},{"id":"b1517366.97b42","type":"debug","z":"11289790.c89848","name":"Text","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":850,"y":380,"wires":[]},{"id":"9bb2c518.0a5628","type":"http request","z":"11289790.c89848","name":"Load wav buffer","method":"GET","ret":"bin","paytoqs":false,"url":"https://raw.githubusercontent.com/johanneskropf/node-red-contrib-voice2json/master/wav/turn_on_lights_kitchen.wav","tls":"","persist":false,"proxy":"","authType":"","x":440,"y":380,"wires":[["a130ba16.223568"]]},{"id":"6940b487.dcee8c","type":"inject","z":"11289790.c89848","name":"Execute STT","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":250,"y":380,"wires":[["9bb2c518.0a5628"]]},{"id":"697e631.4c9599c","type":"inject","z":"11289790.c89848","name":"Start","topic":"","payload":"start","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":230,"y":280,"wires":[["77b4222d.8654cc"]]},{"id":"fd112978.dcb108","type":"inject","z":"11289790.c89848","name":"Stop","topic":"","payload":"stop","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":230,"y":320,"wires":[["77b4222d.8654cc"]]},{"id":"77b4222d.8654cc","type":"change","z":"11289790.c89848","name":"payload -> control","rules":[{"t":"move","p":"payload","pt":"msg","to":"control","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":450,"y":320,"wires":[["a130ba16.223568"]]},{"id":"3cf7b405.ee3c5c","type":"voice2json-config","z":"","profilePath":"/home/pi/voice2json_profile/en-us_kaldi-zamia-2.0","name":"Kaldi english profile","sentences":"[TurnLigths]\r\nturn (on | off){state} the light in the (kitchen | bathroom){room}","slots":[],"removeSlots":true}]
  1. The STT node needs to be started. There are 3 different ways to accomplish this:

    • This node offers auto-start (at deployment time), that can be enabled by activating the "auto start transcriber" checkbox on the config screen. The advantage is that this node will be started immediately (after a deploy or startup), which means it will be ready as soon as the first input voice message arrives.
    • This node can be started explict, by injecting an input message with "start" (and stopped via "stop") as the content of the configured control property of the input msg object. This will be used mostly to restart this node after a new training has been executed.
    • This node will be autostarted automatically when an input voice message arrives, when this node is not started yet. When relying solely on this mode, the first input voice message will take a while to process (since voice2json still needs to load all its resources). Therefore it is advised to use one of the first two modes, since voice2json can load its resources before audio arrives (which greatly reduces the time of the first transcription). And the combination with the last mode will ensure fail safety: if the voice2json process would be halted for some reason, this node will automatically restart the process when the next input voice message arrives. Which might be usefull in a 24/7 setup.
  2. Once started, start injecting input data containing a WAV audio buffer or the path to a WAV file via msg.payload.

  3. The STT node will try to recognize the sentences, which have been specified in the sentences tab of the config node (you need to retrain if you change your slots or sentences and restart the stt node for the stt node to pick those changes):

    [TurnLigths]
    turn (on | off){state} the light in the (kitchen | bathroom){room}
  4. The output message will be an object in the configured msg.property. The property text of this object contains the recognized text as a string. Here is an example output object:

    {
      "text": "turn on the light in the kitchen",
      "likelihood":1,
      "transcribe_seconds":3.4162743357010186,
      "wav_seconds":2.035,
      "tokens":null,
      "wav_name":"stta130ba16223568.wav"
    }

    As you can see the text property contains the text from the wav audio file.

Text To Intent node

Intent analysis involves searching information (rooms, switch statuses, names, ...) in a text, as a part of natural language understanding.

In the previous example flow, the STT node converted the wav file to a text sentence. Now we will send this text to the TTI node, which will extract the required information from that text.

TTI flow

[{"id":"faef3d3a.f726d","type":"debug","z":"11289790.c89848","name":"Intent","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":1050,"y":580,"wires":[]},{"id":"dd141eca.7d435","type":"inject","z":"11289790.c89848","name":"Inject sentence","topic":"","payload":"turn on the light in the kitchen","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":480,"y":580,"wires":[["88000e6b.23f55"]]},{"id":"88000e6b.23f55","type":"voice2json-tti","z":"11289790.c89848","name":"","voice2JsonConfig":"3cf7b405.ee3c5c","inputField":"payload","controlField":"control","outputField":"payload","autoStart":true,"x":880,"y":580,"wires":[["faef3d3a.f726d"]]},{"id":"8cebce39.0bc7c","type":"change","z":"11289790.c89848","name":"payload -> control","rules":[{"t":"move","p":"payload","pt":"msg","to":"control","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":670,"y":520,"wires":[["88000e6b.23f55"]]},{"id":"6fdd8681.afc558","type":"inject","z":"11289790.c89848","name":"Start","topic":"","payload":"start","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":450,"y":480,"wires":[["8cebce39.0bc7c"]]},{"id":"13b82eb1.b72bf1","type":"inject","z":"11289790.c89848","name":"Stop","topic":"","payload":"stop","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":450,"y":520,"wires":[["8cebce39.0bc7c"]]},{"id":"3cf7b405.ee3c5c","type":"voice2json-config","z":"","profilePath":"/home/pi/voice2json_profile/en-us_kaldi-zamia-2.0","name":"Kaldi english profile","sentences":"[TurnLigths]\r\nturn (on | off){state} the light in the (kitchen | bathroom){room}","slots":[],"removeSlots":true}]
  1. The sentences in the config node contain all the information that we want to extract from the text:

    [TurnLigths]
    turn (on | off){state} the light in the (kitchen | bathroom){room}
  2. This way the TTI node knows that it needs to determine the light 'state' (which can be on or off) and the room (which can be kitchen or bathroom).

  3. The output will contain the information about both variables:

    {
      "text":"turn on the light in the kitchen",
      "intent":{
         "name":"TurnLigths",
         "confidence":1
      },
      "entities":[{
         "entity":"state",
         "value":"on",
         ...
      },
      {
         "entity":"room",
         "value":"kitchen",
         ...
      }],
      ...
      "slots":{
         "state":"on",
         "room":"kitchen"
      }
    }

    Based on the confidence field, it is possible to determine whether you want to accept the value or reject it...

The node can be started, stopped or restarted with the same messages as the stt node that include a valid payload in the configured control property of the input message object.

Advanced topics

How the transcription / intent recognition works in voice2json

To learn about how voive2json works in detail and better understand how it works we recommend to have a look at the whitepaper about the whole process by the voice2json project.

Some basics about this:

Limiting false positive results

Since the STT node will always give a result (i.e. the closest one) even if it doesn't match, there will be false positives caused by random audio. These false positivies be reduced with a number of strategies:

Minimizing SD card wearing

Voice2json expects all input to be file-based, which means you have to store a file on the filesystem and pass the file path to Voice2json. Which means you can simply use the STT node like this:

Filesystem

  1. Other nodes in the flow store a wav file on the filesystem.
  2. A path to that file is injected via an input message into the STT node.
  3. The STT node calls Voice2json and passes that file path.
  4. Voice2json will load the wav file from the filesystem, and process it.

However this requires continiously writing to the filesystem, which can be very desctructive for some hardware like SD cards. To solve that problem, an in-memory approach has been provided:

In memory

  1. Another node in the flow (e.g. Record-Command node) injects a WAV buffer via an input message into the STT node.
  2. The STT node will write the WAV file to an in-memory filesystem /dev/shm. Directory /dev/shm/ is mounted to ram by default and you can read more about it here.
  3. The STT node calls Voice2json and passes that file path.
  4. Voice2json will load the wav file from the in-memory filesystem, and process it.

Caution: not all Linux system provide the in-memory filesystem ! In those cases the STT node will use the /tmp/ directory, which will result in lots of writing to filesystem again! Note that in these cases you will need to include this path in the above Docker file, instead of /dev/shm/ (if you use the docker container).

When /dev/shm/ is not available on hardware similiar to a Raspberry Pi, another solution might be available:

  1. Create a folder using the mkdircommand (for example mkdir /home/pi/tmp). would be to create your own folder that is mounted to tmpfs via fstab. You can do this by creating a and than
  2. Add the line tmpfs /home/pi/tmp tmpfs defaults,noatime,size=100m 0 0 to file /etc/fstab.
  3. After a reboot, the directory /home/pi/tmp will automatically be mounted to ram.
  4. Add nodes (e.g. the File-Out node) to your flow to store the WAV file into that in-memory directory.
  5. Inject a message into the STT node, containing the path to that WAV file.

An in-memory filesystem means that data in it will be lost upon reboot, but sd card writes will be greatly reduced... More information on this approach can be found here.

Limitations

Hardware setups

Some possible hardware setups are being listed here, to get you started. Each setup will have both advantages and disadvantages. The setups all fit into the flow chart from the top of this readme about the logic of processing something with nodered and voice2json and all the possible input sources.

A single do it all device (Raspberry Pi or similar)

The simplest way to set up a complete workflow from wake word to intent processing is a single device running linux that supports both nodered and voice2json. This could for example be the very popular raspberry pi which from model 3 onwards is more than capable enough to run this combination. As the most basic requirement you will also need some form of microphone. A good start can be cheap usb conference microphones that are linux compatible. Another popular option are the respeaker pi hats and microphones. You may also want to add a small speaker for sound feedback. You can now set up a complete speech command workflow on this device purely from nodered. Install one of the microphone nodes and connect it to the suite of voive2json nodes streaming raw audio buffers in the right format.

Master satellite setup with Raspberry Zero for voice capture

When a series of microphones need to be installed in a building, it might become too expensive to use Raspberry Pi (3 or 4) devices. In those cases one might consider to use Raspberry Pi Zero devices to reduce the cost. However a single core Raspberry Pi zero is not powerful enough to run wake-word detection. As a result the Zero will run a Node-RED flow that captures audio from its microphone, and then it will need to send that audio (as a continious stream) to a Raspberry Pi (3 or 4). That central Raspberry Pi will need to run a Node-RED flow, that needs to do all the Voice2Json processing:

Zero setup

Keep in mind that this setup will result in a large amount of network traffic, even when you are not using speech recognition! This can only be solved by running the wake-word detection on the device which is connected to the microphone.

An Apple iOS siri-shortcut to send audio to nodered to be processed by voice2json

You can create a siri-shortcut in the shortcuts app on your iphone or ipad with a content like this:

to send audio via an http request to nodered and convert it to the right format with sox-utils:

[{"id":"a87acd93.c30f4","type":"http in","z":"6417ff5b.9a455","name":"","url":"/audio","method":"put","upload":false,"swaggerDoc":"","x":130,"y":2080,"wires":[["71abe4b.00b8d1c","94801716.ebafc8"]]},{"id":"71abe4b.00b8d1c","type":"sox-convert","z":"6417ff5b.9a455","name":"","conversionType":"wav","outputToFile":"buffer","manualPath":"","wavMore":true,"wavByteOrder":"-L","wavEncoding":"signed-integer","wavChannels":1,"wavRate":16000,"wavBits":16,"flacMore":false,"flacCompression":8,"flacChannels":1,"flacRate":16000,"flacBits":16,"mp3More":false,"mp3Channels":2,"mp3Rate":44100,"mp3BitRate":128,"oggMore":false,"oggCompression":3,"oggChannels":2,"oggRate":44100,"debugOutput":false,"x":310,"y":2080,"wires":[["d09c9f59.8ed97"],[]]},{"id":"94801716.ebafc8","type":"http response","z":"6417ff5b.9a455","name":"","statusCode":"","headers":{},"x":290,"y":2140,"wires":[]},{"id":"d09c9f59.8ed97","type":"voice2json-stt","z":"6417ff5b.9a455","name":"","voice2JsonConfig":"a66d83bd.16a7d8","inputField":"payload","controlField":"control","outputField":"payload","autoStart":true,"x":500,"y":2080,"wires":[["207c9eec.afe73a"]]},{"id":"207c9eec.afe73a","type":"debug","z":"6417ff5b.9a455","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":690,"y":2080,"wires":[]},{"id":"a66d83bd.16a7d8","type":"voice2json-config","z":"","profilePath":"/home/pi/en-us_kaldi-zamia-2.0","name":"enUsKaldi","sentences":"[GetTime]\nwhat time is it\ntell me the time\n\n[GetTemperature]\nwhats the temperature\nhow (hot | cold) is it\n\n[GetGarageState]\nis the garage door (open | closed)\n\n[ChangeLightState]\nlight_name = ((living room lamp | garage light) {name}) | <ChangeLightColor.light_name>\nlight_state = (on | off) {state}\n\nturn <light_state> [the] <light_name>\nturn [the] <light_name> <light_state>\n\n[ChangeLightColor]\nlight_name = (bedroom light) {name}\n\nset [the] <light_name> [to] $color\nmake [the] <light_name> $color","slots":[{"fileName":"rhasspy/number","managedBy":"external","fileContent":null,"executable":true},{"fileName":"color","managedBy":"external","fileContent":null,"executable":false}],"removeSlots":true}]

This approach will work for any audio source that can send an audio file in a convertible format to nodered over an http request, mqtt or a websocket.