sk89q / hey-victoria

TeamSpeak bot w/ speech recognition (like Siri, OK Google, Cortana, etc.)
49 stars 4 forks source link

Hey Victoria

Hey Victoria is an experimental English-understanding speech recognition assistant that connects to a TeamSpeak 3 channel. She is controlled entirely through speech.

Examples of commands that Victoria can currently understand include:

The project is currently in a proof-of-concept state and is rough around the edges.

Architecture

In order to record what is spoken, a TeamSpeak plugin is currently used due to the lack of a library to connect to a TeamSpeak server.

Each user's voice data is sent to a listening server that performs the necessary speech recognition.

Architecture

Currently the client needs to run on the same system and user account as the TeamSpeak client. In addition, the default audio output device must be set as the default capture device in TeamSpeak. Some of Victoria's components currently require Microsoft Windows.

Prerequisites

Python libraries:

Supporting software:

Data:

API keys:

Installation

Everything should be run on the same user account in Windows, and TeamSpeak should be configured to capture the output of the default audio output device.

Voice Copy Plugin

The Voice Copy plugin is the TeamSpeak plugin component.

  1. Compile the solution found in the ts3_voice_copy/ folder. Remember to select the appropriate architecture for your TeamSpeak client version (Win32 or x64).
  2. Install the plugin found in the bin/ folder into TeamSpeak.
  3. Enable the plugin in TeamSpeak.

By default, the voice copy plugin is configured to send voice data to port 32000 at 127.0.0.1. To adjust this, change plugin.c appropriately.

Listen Server

Inside the listen_server/ folder:

Create a config.ini file and in it, place:

[server]
host=127.0.0.1
port=32000

[youtube]
apiKey=

Configure the values and enter your YouTube API key.

Run listen.py with the path to the configuration file: python listen.py config.ini

On initial start, something should be said over text to speech and the beep sounds should be heard.

Usage

Victoria works best in a channel set to the Opus Music audio quality setting. Other codecs significantly degrade the ability for the assistant to detect the key phrase.

If the key phrase ("Victoria") is heard, a beep sound should be heard. A command must be then said afterwards, taking into consideration that sentences are recognized better than single word commands. However, ultimately Victoria is looking for a specific word to decide what to do.

Once the speaker has finished talking, Victoria will sound another beep a second or two after silence had started. Victoria will also eventually stop listening if the speaker does not seem to stop speaking.

The first invocation of the speech recognition engine may have very poor results. Try again a second time.

Commands currently include:

The flow of interaction is:

  1. The speaker mentions "Victoria"
  2. A sound is emitted indicating recognition
  3. The speaker mentions a command
  4. The speaker stops speaking
  5. A different sound is emitted indicating that the recording has finished
  6. The assistant responds accordingly

If the command portion is not recognized or an unknown command is mentioned, then Victoria will say so using text-to-speech.

License

Hey Victoria is licensed under GNU Lesser General Public License v3.

Credits

The sounds are sourced from: