Hey Victoria

Hey Victoria is an experimental English-understanding speech recognition assistant that connects to a TeamSpeak 3 channel. She is controlled entirely through speech.

Each user in the channel is listened to individually and can interact with the assistant even when others are speaking.
The assistant is continually listening, but is triggered by mentioning her name, after which she accepts a spoken instruction.
Upon hearing her name, she will emit a sound to indicate that she is recording, followed by a different sound when she has realized that the person has finished spoken.

Examples of commands that Victoria can currently understand include:

Hey Victoria, can you play Satellite Stories Helsinki Art Scene?
Victoria, can you YouTube Brand New?
Ok Victoria, stop playback please.
Victoria, repeat "Meat bags should watch out."

The project is currently in a proof-of-concept state and is rough around the edges.

Architecture

In order to record what is spoken, a TeamSpeak plugin is currently used due to the lack of a library to connect to a TeamSpeak server.

Each user's voice data is sent to a listening server that performs the necessary speech recognition.

Architecture

Currently the client needs to run on the same system and user account as the TeamSpeak client. In addition, the default audio output device must be set as the default capture device in TeamSpeak. Some of Victoria's components currently require Microsoft Windows.

Prerequisites

Microsoft Windows
Microsoft Visual Studio (get VS Express)
32-bit Python 2.7
Visual C++ Redistributable (if using VS 2013)

Python libraries:

SpeechRecognition (pip install speechrecognition)
pyttsx (pip install pyttsx)
TextBlob (pip install textblob)
Google Data API Client (pip install google-api-python-client)
pyaudio (download binaries)
Python for Windows Extensions (download binaries)
pocketsphinx (follow README)

Supporting software:

youtube-dl (download binaries)
ffmpeg/ffplay (find unofficial binaries)

Data:

NLTK corpora (python -m textblob.download_corpora)
PocketSphinx data (from its Git repo)

API keys:

YouTube API access key (use Google Console)

Installation

Everything should be run on the same user account in Windows, and TeamSpeak should be configured to capture the output of the default audio output device.

Voice Copy Plugin

The Voice Copy plugin is the TeamSpeak plugin component.

Compile the solution found in the ts3_voice_copy/ folder. Remember to select the appropriate architecture for your TeamSpeak client version (Win32 or x64).
Install the plugin found in the bin/ folder into TeamSpeak.
Enable the plugin in TeamSpeak.

By default, the voice copy plugin is configured to send voice data to port 32000 at 127.0.0.1. To adjust this, change plugin.c appropriately.

Listen Server

Inside the listen_server/ folder:

Place into pocketsphinx/ the model/ and test/ folders from the pocketsphinx project (see prerequisites above).
Place ffplay.exe and youtube-dl.exe into the bin/ folder.

Create a config.ini file and in it, place:

[server]
host=127.0.0.1
port=32000

[youtube]
apiKey=

Configure the values and enter your YouTube API key.

Run listen.py with the path to the configuration file: python listen.py config.ini

On initial start, something should be said over text to speech and the beep sounds should be heard.

Usage

Victoria works best in a channel set to the Opus Music audio quality setting. Other codecs significantly degrade the ability for the assistant to detect the key phrase.

If the key phrase ("Victoria") is heard, a beep sound should be heard. A command must be then said afterwards, taking into consideration that sentences are recognized better than single word commands. However, ultimately Victoria is looking for a specific word to decide what to do.

Once the speaker has finished talking, Victoria will sound another beep a second or two after silence had started. Victoria will also eventually stop listening if the speaker does not seem to stop speaking.

The first invocation of the speech recognition engine may have very poor results. Try again a second time.

Commands currently include:

"say," or "repeat" followed by the text for Victoria to speak using TTS
"youtube," or "play" followed by a search query to play a YouTube video
"stop" to stop playing anything currently playing

The flow of interaction is:

The speaker mentions "Victoria"
A sound is emitted indicating recognition
The speaker mentions a command
The speaker stops speaking
A different sound is emitted indicating that the recording has finished
The assistant responds accordingly

If the command portion is not recognized or an unknown command is mentioned, then Victoria will say so using text-to-speech.

License

Hey Victoria is licensed under GNU Lesser General Public License v3.

Credits

The sounds are sourced from:

pan14

sk89q / hey-victoria

readme

Hey Victoria

Architecture

Prerequisites

Installation

Voice Copy Plugin

Listen Server

Usage

License

Credits