mezbaul-h / june

Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit
MIT License
714 stars 43 forks source link
ai assistant-chat-bots chatbot chatbots cli-app command-line-tool coqui-tts huggingface large-language-models llm python speech-recognition speech-to-text text-to-speech whisper

june

Local Voice Chatbot: Ollama + HF Transformers + Coqui TTS Toolkit

OVERVIEW

june is a local voice chatbot that combines the power of Ollama (for language model capabilities), Hugging Face Transformers (for speech recognition), and the Coqui TTS Toolkit (for text-to-speech synthesis). It provides a flexible, privacy-focused solution for voice-assisted interactions on your local machine, ensuring that no data is sent to external servers.

demo-text-only-interaction

Interaction Modes

INSTALLATION

Pre-requisites

From Source

Method 1: Direct Installation

To install june directly from the GitHub repository:

pip install git+https://github.com/mezbaul-h/june.git@master

Method 2: Clone and Install

Alternatively, you can clone the repository and install it locally:

git clone https://github.com/mezbaul-h/june.git
cd june
pip install .

USAGE

Pull the language model (default is llama3.1:8b-instruct-q4_0) with Ollama first, if you haven't already:

ollama pull llama3.1:8b-instruct-q4_0

Next, run the program (with default configuration):

june-va

This will use llama3.1:8b-instruct-q4_0 for LLM capabilities, openai/whisper-small.en for speech recognition, and tts_models/en/ljspeech/glow-tts for audio synthesis.

You can also customize behaviour of the program with a json configuration file:

june-va --config path/to/config.json

[!NOTE] The configuration file is optional. To learn more about the structure of the config file, see the Customization section.

CUSTOMIZATION

The application can be customised using a configuration file. The config file must be a JSON file. The default configuration is as follows:

{
    "llm": {
        "disable_chat_history": false,
        "model": "llama3.1:8b-instruct-q4_0"
    },
    "stt": {
        "device": "torch device identifier (`cuda` if available; otherwise `cpu`",
        "generation_args": {
            "batch_size": 8
        },
        "model": "openai/whisper-small.en"
    },
    "tts": {
        "device": "torch device identifier (`cuda` if available; otherwise `cpu`",
        "model": "tts_models/en/ljspeech/glow-tts"
    }
}

When you use a configuration file, it overrides the default configuration but does not overwrite it. So you can partially modify the configuration if you desire. For instance, if you do not wish to use speech recognition and only want to provide prompts through text, you can disable that by using a config file with the following configuration:

{
  "stt": null
}

Similarly, you can disable the audio synthesiser, or both, to only use the virtual assistant in text mode.

If you only want to modify the device on which you want to load a particular type of model, without changing the other default attributes of the model, you could use:

{
  "tts": {
    "device": "cpu"
  }
}

Configuration Attributes

llm - Language Model Configuration

stt - Speech-to-Text Model Configuration

tts - Text-to-Speech Model Configuration

Frequently Asked Questions

Q: How does the voice input work?

After seeing the [system]> Listening for sound... message, you can speak directly into the microphone. Unlike typical voice assistants, there's no wake command required. Simply start speaking, and the tool will automatically detect and process your voice input. Once you finish speaking, maintain silence for 3 seconds to allow the assistant to process your voice input.

Q: Can I clone a voice?

Many of the models (e.g., tts_models/multilingual/multi-dataset/xtts_v2) supported by Coqui's TTS Toolkit support voice cloning. You can use your own speaker profile with a small audio clip (approximately 1 minute for most models). Once you have the clip, you can instruct the assistant to use it with a custom configuration like the following:

{
  "tts": {
    "model": "tts_models/multilingual/multi-dataset/xtts_v2",
    "generation_args": {
      "language": "en",
      "speaker_wav": "/path/to/your/target/voice.wav"
    }
  }
}

Q: Can I use a remote Ollama instance with june?

Yes, you can easily integrate a remotely hosted Ollama instance with june instead of using a local instance. Here's how to do it:

  1. Set the OLLAMA_HOST environment variable to the appropriate URL of your remote Ollama instance.
  2. Run the program as usual.

Example:

To use a remote Ollama instance, you would use a command like this:

OLLAMA_HOST=http://localhost:11434 june-va