matatonic / openedai-speech

An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.
GNU Affero General Public License v3.0
438 stars 53 forks source link

Coqui Xttsv2 doesn't work on Mac GPU #44

Closed Goekdeniz-Guelmez closed 2 months ago

Goekdeniz-Guelmez commented 2 months ago

Hey, I love this project, but when adding a custom voice and calling it with the API i get this error:

{"message":"Error loading voice: josie, KeyError: 'josie'","code":400,"type":"BadRequestError","param":"voice"}%

here is the voice_to_speaker.yaml config file, the wav file exists and is in the specified path:

tts-1-hd:
  josie:
    model: xtts_v2.0.2
    speaker: voices/josie.wav
    language: auto
    enable_text_splitting: True
    length_penalty: 1.0
    repetition_penalty: 10
    speed: 1.0
    temperature: 0.75
    top_k: 50
    top_p: 0.85
    comment: J.O.S.I.E.'s voice is a calm yet profetional and smooth style with a litle flirty tone.

Here is the logs:

2024-08-15 16:49:39 server-1  | INFO:     Started server process [17]
2024-08-15 16:49:39 server-1  | INFO:     Waiting for application startup.
2024-08-15 16:49:39 server-1  | INFO:     Application startup complete.
2024-08-15 16:49:39 server-1  | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2024-08-15 16:49:49 server-1  | INFO:     192.168.65.1:47932 - "POST /v1/audio/speech HTTP/1.1" 422 Unprocessable Entity
2024-08-15 16:49:55 server-1  | 2024-08-15 14:49:55.018 | INFO     | openedai:openai_statuserror_handler:106 - BadRequestError(message="Error loading voice: josie, KeyError: 'josie'", code=400, param=voice)
2024-08-15 16:49:55 server-1  | INFO:     192.168.65.1:33620 - "POST /v1/audio/speech HTTP/1.1" 400 Bad Request
matatonic commented 2 months ago

Can you enable EXTRA_ARGS=--log-level DEBUG in the speech.env and run it again? I'd like to see the debug logs.

Goekdeniz-Guelmez commented 2 months ago

Here are the new logs:

server-1  | 2024-08-16 15:05:56.147 | DEBUG    | openedai:log_requests:120 - Request path: /v1/audio/speech
server-1  | 2024-08-16 15:05:56.150 | DEBUG    | openedai:log_requests:121 - Request method: POST
server-1  | 2024-08-16 15:05:56.150 | DEBUG    | openedai:log_requests:122 - Request headers: Headers({'host': 'localhost:8000', 'user-agent': 'curl/8.7.1', 'accept': '*/*', 'content-type': 'application/json', 'content-length': '158'})
server-1  | 2024-08-16 15:05:56.150 | DEBUG    | openedai:log_requests:123 - Request query params:
server-1  | 2024-08-16 15:05:56.151 | DEBUG    | openedai:log_requests:124 - Request body: b'{\n    "model": "tts-1",\n    "input": "The quick brown fox jumped over the lazy dog.",\n    "voice": "josie",\n    "response_format": "mp3",\n    "speed": 1.0\n  }'
server-1  | 2024-08-16 15:05:56.171 | INFO     | openedai:openai_statuserror_handler:106 - BadRequestError(message="Error loading voice: josie, KeyError: 'josie'", code=400, param=voice)
server-1  | 2024-08-16 15:05:56.172 | DEBUG    | openedai:log_requests:128 - Response status code: 400
server-1  | 2024-08-16 15:05:56.172 | DEBUG    | openedai:log_requests:129 - Response headers: MutableHeaders({'content-length': '111', 'content-type': 'application/json'})
server-1  | INFO:     192.168.65.1:61633 - "POST /v1/audio/speech HTTP/1.1" 400 Bad Request
matatonic commented 2 months ago

model setting is 'tts-1' should be 'tts-1-hd' for xtts.

Goekdeniz-Guelmez commented 2 months ago

still have the same error sadly:

2024-08-16 17:21:17 server-1  | 2024-08-16 15:21:17.468 | DEBUG    | openedai:log_requests:120 - Request path: /v1/audio/speech
2024-08-16 17:21:17 server-1  | 2024-08-16 15:21:17.469 | DEBUG    | openedai:log_requests:121 - Request method: POST
2024-08-16 17:21:17 server-1  | 2024-08-16 15:21:17.469 | DEBUG    | openedai:log_requests:122 - Request headers: Headers({'host': 'localhost:8000', 'user-agent': 'curl/8.7.1', 'accept': '*/*', 'content-type': 'application/json', 'content-length': '161'})
2024-08-16 17:21:17 server-1  | 2024-08-16 15:21:17.469 | DEBUG    | openedai:log_requests:123 - Request query params: 
2024-08-16 17:21:17 server-1  | 2024-08-16 15:21:17.469 | DEBUG    | openedai:log_requests:124 - Request body: b'{\n    "model": "tts-1-hd",\n    "input": "The quick brown fox jumped over the lazy dog.",\n    "voice": "josie",\n    "response_format": "mp3",\n    "speed": 1.0\n  }'
2024-08-16 17:21:17 server-1  | 2024-08-16 15:21:17.478 | INFO     | openedai:openai_statuserror_handler:106 - BadRequestError(message="Error loading voice: josie, KeyError: 'josie'", code=400, param=voice)
2024-08-16 17:21:17 server-1  | 2024-08-16 15:21:17.479 | DEBUG    | openedai:log_requests:128 - Response status code: 400
2024-08-16 17:21:17 server-1  | 2024-08-16 15:21:17.479 | DEBUG    | openedai:log_requests:129 - Response headers: MutableHeaders({'content-length': '111', 'content-type': 'application/json'})
2024-08-16 17:21:17 server-1  | INFO:     192.168.65.1:39915 - "POST /v1/audio/speech HTTP/1.1" 400 Bad Request
matatonic commented 2 months ago

So I copied your config snippet and tested and had no errors, so I guess your config is not being used.

python say.py -v josie -m tts-1-hd -t 'The quick brown fox jumped over the lazy dog.' -p

You're using docker, are you using the default docker-compose.yaml or the -min image? For xtts (tts-1-hd) you need the standard image (not -min). If you are, can you show me the full startup logs for the server?

Goekdeniz-Guelmez commented 2 months ago

Ahhhh ok, yes I'm using the -min docker compose file because I'm on a M1 Mac, should i still give you the startup logs for the server?

matatonic commented 2 months ago

Unfortunately xtts doesn't work on a Mac yet, so you're stuck with just piper for now (no voice cloning). There are lots of good piper voices though, be sure to look through the samples linked in the documentation and you may find one you like better than the defaults.

Goekdeniz-Guelmez commented 2 months ago

Thanks for your help! Is there a way I can be known when it's possible on M series Macs?

matatonic commented 2 months ago

It could be a long time, the low level math code is missing so far, but if/when it's done and if I remember, I will update this ticket.