rhasspy / rhasspy3

An open source voice assistant toolkit for many human languages
MIT License
311 stars 26 forks source link

Rhasspy 3

NOTE: This is a very early developer preview!

An open source toolkit for building voice assistants.

Voice assistant pipeline

Rhasspy focuses on:

Getting Started

Missing Pieces

This is a developer preview, so there are lots of things missing:

Core Concepts

Domains

Rhasspy is organized by domain:

Programs

Rhasspy talks to external programs using the Wyoming protocol. You can add your own programs by implementing the protocol or using an adapter.

Adapters

Small scripts that live in bin/ and bridge existing programs into the Wyoming protocol.

For example, a speech to text program (asr) that accepts a WAV file and outputs text can use asr_adapter_wav2text.py

Pipelines

Complete voice loop from microphone input (mic) to speaker output (snd). Stages are:

  1. detect (optional)
    • Wait until wake word is detected in mic
  2. transcribe
    • Listen until vad detects silence, then convert audio to text
  3. recognize (optional)
    • Recognize an intent from text
  4. handle
    • Handle an intent or text, producing a text response
  5. speak
    • Convert handle output text to speech, and speak through snd

Servers

Some programs take a while to load, so it's best to leave them running as a server. Use bin/server_run.py or add --server <domain> <name> when running the HTTP server.

See servers section of configuration.yaml file.


Supported Programs


HTTP API

http://localhost:13331/<endpoint>

Unless overridden, the pipeline named "default" is used.

WebSocket API

ws://localhost:13331/<endpoint>

Audio streams are raw PCM in binary messages.

Use the rate, width, and channels parameters for sample rate (hertz), width (bytes), and channel count. By default, input audio is 16Khz 16-bit mono, and output audio is 22Khz 16-bit mono.

The client can "end" the audio stream by sending an empty binary message.