NOTE: This is a very early developer preview!
An open source toolkit for building voice assistants.
Rhasspy focuses on:
This is a developer preview, so there are lots of things missing:
Rhasspy is organized by domain:
Rhasspy talks to external programs using the Wyoming protocol. You can add your own programs by implementing the protocol or using an adapter.
Small scripts that live in bin/
and bridge existing programs into the Wyoming protocol.
For example, a speech to text program (asr
) that accepts a WAV file and outputs text can use asr_adapter_wav2text.py
Complete voice loop from microphone input (mic) to speaker output (snd). Stages are:
Some programs take a while to load, so it's best to leave them running as a server. Use bin/server_run.py
or add --server <domain> <name>
when running the HTTP server.
See servers
section of configuration.yaml
file.
http://localhost:13331/<endpoint>
Unless overridden, the pipeline named "default" is used.
/pipeline/run
pipeline
or:
wake_program
asr_program
intent_program
handle_program
tts_program
snd_program
start_after
wake
- skip detection, body is detection name (text)asr
- skip recording, body is transcript (text) or WAV audiointent
- skip recognition, body is intent/not-recognized event (JSON)handle
- skip handling, body is handle/not-handled event (JSON)tts
- skip synthesis, body is WAV audiostop_after
wake
- only detectionasr
- detection and transcriptionintent
- detection, transcription, recognitionhandle
- detection, transcription, recognition, handlingtts
- detection, transcription, recognition, handling, synthesis/wake/detect
wake_program
or pipeline
/asr/transcribe
asr_program
or pipeline
/intent/recognize
text
(GET)intent_program
or pipeline
/handle/handle
input
(GET)Content-Type
must be application/json
for intent inputhandle_program
or pipeline
/tts/synthesize
text
(GET)tts_program
or pipeline
/tts/speak
text
(GET)tts_program
, snd_program
, or pipeline
/snd/play
snd_program
or pipeline
/config
/version
ws://localhost:13331/<endpoint>
Audio streams are raw PCM in binary messages.
Use the rate
, width
, and channels
parameters for sample rate (hertz), width (bytes), and channel count. By default, input audio is 16Khz 16-bit mono, and output audio is 22Khz 16-bit mono.
The client can "end" the audio stream by sending an empty binary message.
/pipeline/asr-tts
pipeline
or:
asr_program
vad_program
handle_program
tts_program
in_rate
, in_width
, in_channels
for audio input formatout_rate
, out_width
, out_channels
for audio output format/wake/detect
wake_program
or pipeline
/asr/transcribe
asr_program
or pipeline
/snd/play
snd_program
or pipeline