travisvn / openai-edge-tts

Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
https://tts.travisvn.com
GNU General Public License v3.0
110 stars 17 forks source link
ai chatgpt edge-tts llm llm-webui llms ollama ollama-webui open-webui openai self-hosted speech text-to-speech tts

OpenAI-Compatible Edge-TTS API ๐Ÿ—ฃ๏ธ

GitHub stars GitHub forks GitHub repo size GitHub language count GitHub top language GitHub last commit Hits

This project provides a local, OpenAI-compatible text-to-speech (TTS) API using edge-tts. It emulates the OpenAI TTS endpoint (/v1/audio/speech), enabling users to generate speech from text with various voice options and playback speeds, just like the OpenAI API.

edge-tts uses Microsoft Edge's online text-to-speech service, so it is completely free.

View this project on Docker Hub

Please โญ๏ธ star this repo if you find it helpful

Features

Getting Started

Prerequisites

Installation

  1. Clone the Repository:

    git clone https://github.com/travisvn/openai-edge-tts.git
    cd openai-edge-tts
  2. Environment Variables: Create a .env file in the root directory with the following variables:

    
    API_KEY=your_api_key_here
    PORT=5050

DEFAULT_VOICE=en-US-AndrewNeural DEFAULT_RESPONSE_FORMAT=mp3 DEFAULT_SPEED=1.0

DEFAULT_LANGUAGE=en-US

REQUIRE_API_KEY=True


Or, copy the default `.env.example` with the following:
```bash
cp .env.example .env
  1. Run with Docker Compose (recommended):
docker compose up --build

(Note: docker-compose is not the same as docker compose)

Run with -d to run docker compose in "detached mode", meaning it will run in the background and free up your terminal.

docker compose up -d

Alternatively, run directly with Docker:

docker build -t openai-edge-tts .
docker run -p 5050:5050 --env-file .env openai-edge-tts

To run the container in the background, add -d after the docker run command:

docker run -d -p 5050:5050 --env-file .env openai-edge-tts
  1. Access the API: Your server will be accessible at http://localhost:5050.

Running with Python

If you prefer to run this project directly with Python, follow these steps to set up a virtual environment, install dependencies, and start the server.

1. Clone the Repository

git clone https://github.com/travisvn/openai-edge-tts.git
cd openai-edge-tts

2. Set Up a Virtual Environment

Create and activate a virtual environment to isolate dependencies:

# For macOS/Linux
python3 -m venv venv
source venv/bin/activate

# For Windows
python -m venv venv
venv\Scripts\activate

3. Install Dependencies

Use pip to install the required packages listed in requirements.txt:

pip install -r requirements.txt

4. Configure Environment Variables

Create a .env file in the root directory and set the following variables:

API_KEY=your_api_key_here
PORT=5050

DEFAULT_VOICE=en-US-AndrewNeural
DEFAULT_RESPONSE_FORMAT=mp3
DEFAULT_SPEED=1.0

DEFAULT_LANGUAGE=en-US

REQUIRE_API_KEY=True

5. Run the Server

Once configured, start the server with:

python app/server.py

The server will start running at http://localhost:5050.

6. Test the API

You can now interact with the API at http://localhost:5050/v1/audio/speech and other available endpoints. See the Usage section for request examples.

Usage

Endpoint: /v1/audio/speech

Generates audio from the input text. Available parameters:

Required Parameter:

Optional Parameters:

Example request with curl and saving the output to an mp3 file:

curl -X POST http://localhost:5050/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.",
    "voice": "echo",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

Or, to be in line with the OpenAI API endpoint parameters:

curl -X POST http://localhost:5050/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "model": "tts-1",
    "input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.",
    "voice": "alloy"
  }' \
  --output speech.mp3

And an example of a language other than English:

curl -X POST http://localhost:5050/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "model": "tts-1",
    "input": "ใ˜ใ‚ƒใ‚ใ€่กŒใใ€‚้›ป่ปŠใฎๆ™‚้–“ใ€่ชฟในใฆใŠใใ‚ˆใ€‚",
    "voice": "ja-JP-KeitaNeural"
  }' \
  --output speech.mp3

Additional Endpoints

Contributing

Contributions are welcome! Please fork the repository and create a pull request for any improvements.

License

This project is licensed under GNU General Public License v3.0 (GPL-3.0), and the acceptable use-case is intended to be personal use. For enterprise or non-personal use of openai-edge-tts, contact me at tts@travisvn.com


Example Use Case

Open WebUI

Open up the Admin Panel and go to Settings -> Audio

Below, you can see a screenshot of the correct configuration for using this project to substitute the OpenAI endpoint

Screenshot of Open WebUI Admin Settings for Audio adding the correct endpoints for this project

[!NOTE] View the official docs for Open WebUI integration with OpenAI Edge TTS

AnythingLLM

In version 1.6.8, AnythingLLM added support for "generic OpenAI TTS providers" โ€” meaning we can use this project as the TTS provider in AnythingLLM

Open up settings and go to Voice & Speech (Under AI Providers)

Below, you can see a screenshot of the correct configuration for using this project to substitute the OpenAI endpoint

Screenshot of AnythingLLM settings for Voice adding the correct endpoints for this project


Quick Info


Voice Samples ๐ŸŽ™๏ธ

Play voice samples and see all available Edge TTS voices