rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
6.06k stars 441 forks source link

Docker Image with REST API? #410

Open Slyke opened 7 months ago

Slyke commented 7 months ago

Hello, I can see that this project is the successor to Mycroft's Mimic3. I was wondering if there's a docker container for this project so that something like this can be achieved:

curl -X POST --data 'Hello world.' --output - localhost:59125/api/tts | aplay

At the moment, I have Mimic3 running on my Kubernetes cluster, and use it for a bunch of things, but it has some issues with parsing SSML, and it looks like the project is abandoned and the main developer has moved here.

I have integrated Mimic3 on my cluster into Home Assistant:

tts:
  - platform: marytts
    host: 10.25.29.113
    port: 59125
    codec: "WAVE_FILE"
    language: "en_US"
    voice: "en_US/hifi-tts_low#92"
  - platform: google_translate

But would rather Piper be separate from Home Assistant and instead be managed by Kubernetes since it has high availability (several nodes and i7 CPUs) as apposed to a single RPi4. I didn't see anything on the main readme about running an API server in Python

artibex commented 7 months ago

it looks like it's possible: https://www.youtube.com/watch?v=pLR5AsbCMHs you can create a http server with the project. So If you modify the Dockerfile this could work

Slyke commented 7 months ago

Ohh yeah, I know it's possible. Was hoping one already existed somewhere. I'll keep using Mimic3 for now if that's the case. If I ever get some more time I'll make one and publish it.

artibex commented 7 months ago

@Slyke I think I solved it. Use this dockerfile example to build a http server with the project:

FROM python:3.11-slim

# Set the working directory
WORKDIR /app

# Get the latest version of the code
RUN apt update && apt install -y git
RUN git clone https://github.com/rhasspy/piper

# Update pip and install the required packages
RUN pip install --upgrade pip

# Set the working directory
WORKDIR /app/piper/src/python_run

# Install the package
RUN pip install -e .

# Install the requirements
RUN pip install -r requirements.txt

# Install http server
RUN pip install -r requirements_http.txt

# Copy the folder of piper-voices/de into the container
COPY /copythis/ /app/models

# Expose the port 5000
EXPOSE 5000

# Run the webserver with python -m piper.http_server --model ...
CMD ["python", "-m", "piper.http_server", "-m", "/app/models/mls-medium.onnx"]

Make sure to reference your own .onnx model

ErroneousBosch commented 7 months ago

Would love to see an official image for this kind of REST API, and am surprised it doesn't exist already. . I have tried sussing out the protocol and may try something like your suggestion.

artibex commented 7 months ago

@ErroneousBosch It's done: https://hub.docker.com/r/artibex/piper-http

first version can download hugging face models from the voices repo: https://huggingface.co/rhasspy/piper-voices/tree/v1.0.0

Example: docker run -e MODEL_DOWNLOAD_LINK=https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/kusal/medium/en_US-kusal-medium.onnx?download=true --name piper -p 5000:5000 artibex/piper-http

let me know if it also works for you

ErroneousBosch commented 7 months ago

@artibex It does indeed work!

curl --header "Content-Type: application/json" --request POST --data 'Hello World' --output - "http://localhost:5000" |aplay

Works as expected. Seems to just repeat back whatever the data is, so no configuration of speaker, but definitely works and returns audio!

artibex commented 7 months ago

@ErroneousBosch my idea was to create one container for each speaker. So if you need other voices just put up a second container on a different port. No need to wait for a voice to load, just run the container and it works 👍

A issue now is that there is no authentication method at the moment. Everyone with the link and port can generate .wav files. Any idea how to do this?

artibex commented 7 months ago

Added the repo for the code of piper-http: https://github.com/artibex/piper-http

ErroneousBosch commented 7 months ago

@artibex while I'm not a python programmer, I am a developer and looking at the code for the built in http_server, it doesn't look like there is a built-in way to specify an API key or authentication. To add it you'd need something more sophisticated in front of it.

Slyke commented 7 months ago

Having some issues with this image: CTRL+C doesn't seem to kill the process when running in WSL2. I added some code into run.py:

import signal

def signal_handler(sig, frame):
    print('Terminating process...')
    sys.exit(0)

signal.signal(signal.SIGINT, signal_handler)

and also added -it --init in the docker run command to try to fix this, with no success. It requires docker stop to be run from another terminal to kill the process and free the first terminal.

It wants to download the model on startup each time, it should accept the model name as a parameter, and only download if it doesn't exist. This would require mounting the target_folder, but that's not a big deal.

Python is not my strength, I might use nikolaik/python-nodejs:python3.11-nodejs20-slim and spin up a NodeJS server that allow for voice switching, downloading models etc with the API. Based off of how the piper.http_server works, it looks like the piper process will have to be killed when switching voices, unless I can figure how to get it to stream from Python to NodeJS for download, then voices can be switched on the fly, and changing options etc.