rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
5.71k stars 408 forks source link

First Call to Webserver works, second call does not and needs to restart server #552

Open CStrue opened 1 month ago

CStrue commented 1 month ago

~/projects/piper/piper/src/python_run $ python3 -m piper.http_server --model ../../../voices/german/de_DE-thorsten-high.onnx

...

INFO:werkzeug:Press CTRL+C to quit

=================== First call - successful 200

INFO:werkzeug: - - [23/Jul/2024 22:26:24] "GET /?text=Mein+Name+ist+Skynet+und+ich+bin+dein+persönlicher+Assistent.+Wie+kann+ich+dir+heute+helfen?. HTTP/1.1" 200 -

===================== second call - throwing error

2024-07-23 22:26:29.666429175 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running GatherElements node. Name:'/dp/flows.7/GatherElements_3' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/gather_elements.cc:154 void onnxruntime::core_impl(const Tensor, const Tensor, Tensor, int64_t, concurrency::ThreadPool) [with Tin = long int; int64_t = long int] GatherElements op: Out of range value in index tensor

ERROR:http_server:Exception on / [GET] Traceback (most recent call last): File "/home/admin/projects/piper/.venv/lib/python3.11/site-packages/flask/app.py", line 1473, in wsgi_app response = self.full_dispatch_request() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/admin/projects/piper/.venv/lib/python3.11/site-packages/flask/app.py", line 882, in full_dispatch_request rv = self.handle_user_exception(e) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/admin/projects/piper/.venv/lib/python3.11/site-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/admin/projects/piper/.venv/lib/python3.11/site-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(view_args) # type: ignore[no-any-return] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/admin/projects/piper/piper/src/python_run/piper/http_server.py", line 119, in app_synthesize voice.synthesize(text, wav_file, synthesize_args) File "/home/admin/projects/piper/piper/src/python_run/piper/voice.py", line 104, in synthesize for audio_bytes in self.synthesize_stream_raw( File "/home/admin/projects/piper/piper/src/python_run/piper/voice.py", line 132, in synthesize_stream_raw yield self.synthesize_ids_to_raw( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/admin/projects/piper/piper/src/python_run/piper/voice.py", line 183, in synthesize_ids_to_raw audio = self.session.run(None, args, )[0].squeeze((0, 1)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/admin/projects/piper/.venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run return self._sess.run(output_names, input_feed, run_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running GatherElements node. Name:'/dp/flows.7/GatherElements_3' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/gather_elements.cc:154 void onnxruntime::core_impl(const Tensor, const Tensor, Tensor, int64_t, concurrency::ThreadPool) [with Tin = long int; int64_t = long int] GatherElements op: Out of range value in index tensor

INFO:werkzeug: - - [23/Jul/2024 22:26:29] "GET /?text=Mein+Name+ist+Skynet+und+ich+bin+dein+persönlicher+Assistent.+Wie+kann+ich+dir+heute+helfen?. HTTP/1.1" 500 -

tebbbb commented 1 month ago

I have the same problem (but with a slightly different error) and reported it also here: https://github.com/artibex/piper-http/issues/3

tebbbb commented 1 month ago

@CStrue just in case, i fixed it by using this:

pip install -U piper-phonemize pip install onnxruntime==1.17.1

instead of using the requirements.txt. Current version 1.18 doesnt seem to work

CStrue commented 1 month ago

I found a solution to it tonight. I'm going to paste it later. Basicall it seems to be a problem in the state management and loading everything within the get/post method of the server (inluding the voice models) solves the problem since it becomes stateless this way.

CStrue commented 1 month ago

server side code: I moved all the code into the flask decorator method. It seems necesessary to do the voice loading every time to avoid the issue. You should be able to move the code also in the original http python file. The con is that the voice loading takes some time, but for my purpose its still fast enough.

from flask import Flask, request, Response
import io
import wave
import logging
from piper import PiperVoice

app = Flask(__name__)
logging.basicConfig(level=logging.ERROR)

@app.route('/synthesize', methods=['GET'])
def synthesize():
    try:
        text = request.args.get('text', '')
        if not text:
            return "Text parameter is required", 400

        voice = PiperVoice.load(
            'voices/german/de_DE-thorsten-high.onnx', 
            config_path='voices/german/de_DE-thorsten-high.onnx.json'
        )

        wav_io = io.BytesIO()
        with wave.open(wav_io, 'wb') as wav_file:
            voice.synthesize(text, wav_file)

        wav_io.seek(0)
        return Response(
            wav_io,
            mimetype='audio/wav',
            headers={"Content-Disposition": "attachment;filename=output.wav"}
        )
    except Exception as e:
        logging.error(f"Error during synthesis: {e}")
        return "An error occurred during synthesis", 500

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=5000)

python client side code to test the server:

import requests
import sounddevice as sd
import numpy as np
import io
import wave

textToSpeak = "Mein Name ist Skynet und ich bin dein persönlicher Assistent. Wie kann ich dir heute helfen?."
urlPiper = "http://localhost:5000/synthesize"

payload = {'text': textToSpeak}
response = requests.get(urlPiper, params=payload, stream=True)

if response.status_code == 200:
    with io.BytesIO() as audio_stream:
        for chunk in response.iter_content(chunk_size=128):
            audio_stream.write(chunk)

        audio_stream.seek(0)

        with wave.open(audio_stream, 'rb') as wf:
            samplerate = wf.getframerate()
            channels = wf.getnchannels()
            width = wf.getsampwidth()
            data = wf.readframes(wf.getnframes())

            if width == 2:  # 16-bit samples
                dtype = np.int16
            elif width == 4:  # 32-bit samples
                dtype = np.int32
            else:
                raise ValueError(f"Unsupported sample width: {width}")

            audio_data = np.frombuffer(data, dtype=dtype)

            if channels == 2:
                audio_data = audio_data.reshape(-1, 2)

            print("Available devices:")
            for i, device in enumerate(sd.query_devices()):
                print(f"{i}: {device['name']}")

            device_id = 11
            print(f"Using device ID: {device_id}")

            sd.play(audio_data, samplerate=samplerate, device=device_id)
            sd.wait() 
else:
    print(f"Failed to get audio. Status code: {response.status_code}")
    print("Response content:", response.text)