matatonic / openedai-speech

An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.
GNU Affero General Public License v3.0
193 stars 32 forks source link

Server error when invoking the API from OPENWEBUI #11

Closed cachibachero closed 1 month ago

cachibachero commented 1 month ago

Tried with manual and docker installation. Both gave the same error: External: 500 Server Error: Internal Server Error for url: http://openwebui.local:8000/v1/audio/speech

matatonic commented 1 month ago

Can you share more info? logs from both servers please. And any docker command lines or docker-compose.yml files, plus the info from the audio/openai settings page.

atmaker commented 1 month ago

I'm getting the same error message in Open WebUI, but I'm not sure if it is caused by the same error. In my case I got some errors pointing towards the model names, or their settings. The voice_to_speaker.yaml is default. One interesting thing was that I got at some point a line in the logs pointing to a voice named 'English (America, New York City)+male1'. No idea where that came from.

Screenshot from 2024-06-14 18-50-30

Screenshot from 2024-06-14 18-49-04

OpenedAi-Speech log:

sudo docker-compose logs openedai-speech
openedai-speech  | Traceback (most recent call last):
openedai-speech  |   File "<string>", line 1, in <module>
openedai-speech  |   File "/usr/local/lib/python3.11/site-packages/TTS/utils/manage.py", line 385, in download_model
openedai-speech  |     model_item, model_full_name, model, md5sum = self._set_model_item(model_name)
openedai-speech  |                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
openedai-speech  |   File "/usr/local/lib/python3.11/site-packages/TTS/utils/manage.py", line 300, in _set_model_item
openedai-speech  |     model_type, lang, dataset, model = model_name.split("/")
openedai-speech  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
openedai-speech  | ValueError: not enough values to unpack (expected 4, got 2)
openedai-speech  | /usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
openedai-speech  |   warnings.warn(
openedai-speech  | /usr/local/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
openedai-speech  |   warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
openedai-speech  | INFO:     Started server process [75]
openedai-speech  | INFO:     Waiting for application startup.
openedai-speech  | INFO:     Application startup complete.
openedai-speech  | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Open WebUI log:

open-webui  | ERROR:apps.audio.main:500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | Traceback (most recent call last):
open-webui  |   File "/app/backend/apps/audio/main.py", line 219, in speech
open-webui  |     r.raise_for_status()
open-webui  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
open-webui  |     raise HTTPError(http_error_msg, response=self)
open-webui  | requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | INFO:     172.25.0.1:45824 - "POST /audio/api/v1/speech HTTP/1.1" 500 Internal Server Error
open-webui  | ERROR:apps.audio.main:500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | Traceback (most recent call last):
open-webui  |   File "/app/backend/apps/audio/main.py", line 219, in speech
open-webui  |     r.raise_for_status()
open-webui  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
open-webui  |     raise HTTPError(http_error_msg, response=self)
open-webui  | requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | INFO:     172.25.0.1:45824 - "POST /audio/api/v1/speech HTTP/1.1" 500 Internal Server Error
open-webui  | ERROR:apps.audio.main:500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | Traceback (most recent call last):
open-webui  |   File "/app/backend/apps/audio/main.py", line 219, in speech
open-webui  |     r.raise_for_status()
open-webui  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
open-webui  |     raise HTTPError(http_error_msg, response=self)
open-webui  | requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | INFO:     172.25.0.1:45824 - "POST /audio/api/v1/speech HTTP/1.1" 500 Internal Server Error
open-webui  | ['KCNtPEoaG2XRE5JzAAAD']
open-webui  | INFO:     127.0.0.1:59056 - "GET /health HTTP/1.1" 200 OK
open-webui  | ERROR:apps.audio.main:500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | Traceback (most recent call last):
open-webui  |   File "/app/backend/apps/audio/main.py", line 219, in speech
open-webui  |     r.raise_for_status()
open-webui  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
open-webui  |     raise HTTPError(http_error_msg, response=self)
open-webui  | requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | INFO:     172.25.0.1:40830 - "POST /audio/api/v1/speech HTTP/1.1" 500 Internal Server Error
open-webui  | ERROR:apps.audio.main:500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | Traceback (most recent call last):
open-webui  |   File "/app/backend/apps/audio/main.py", line 219, in speech
open-webui  |     r.raise_for_status()
open-webui  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
open-webui  |     raise HTTPError(http_error_msg, response=self)
open-webui  | requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | INFO:     172.25.0.1:40830 - "POST /audio/api/v1/speech HTTP/1.1" 500 Internal Server Error
open-webui  | ERROR:apps.audio.main:500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | Traceback (most recent call last):
open-webui  |   File "/app/backend/apps/audio/main.py", line 219, in speech
open-webui  |     r.raise_for_status()
open-webui  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
open-webui  |     raise HTTPError(http_error_msg, response=self)
open-webui  | requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | INFO:     172.25.0.1:40830 - "POST /audio/api/v1/speech HTTP/1.1" 500 Internal Server Error
open-webui  | ERROR:apps.audio.main:500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | Traceback (most recent call last):
open-webui  |   File "/app/backend/apps/audio/main.py", line 219, in speech
open-webui  |     r.raise_for_status()
open-webui  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
open-webui  |     raise HTTPError(http_error_msg, response=self)
open-webui  | requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | INFO:     172.25.0.1:60264 - "POST /audio/api/v1/speech HTTP/1.1" 500 Internal Server Error
open-webui  | ERROR:apps.audio.main:500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | Traceback (most recent call last):
open-webui  |   File "/app/backend/apps/audio/main.py", line 219, in speech
open-webui  |     r.raise_for_status()
open-webui  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
open-webui  |     raise HTTPError(http_error_msg, response=self)
open-webui  | requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | INFO:     172.25.0.1:60264 - "POST /audio/api/v1/speech HTTP/1.1" 500 Internal Server Error
open-webui  | ERROR:apps.audio.main:500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | Traceback (most recent call last):
open-webui  |   File "/app/backend/apps/audio/main.py", line 219, in speech
open-webui  |     r.raise_for_status()
open-webui  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
open-webui  |     raise HTTPError(http_error_msg, response=self)
open-webui  | requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://openedai-speech:8000/v1/audio/speech
open-webui  | INFO:     172.25.0.1:60264 - "POST /audio/api/v1/speech HTTP/1.1" 500 Internal Server Error
open-webui  | INFO:     127.0.0.1:56652 - "GET /health HTTP/1.1" 200 OK

docker-compose.yaml:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:cuda
    container_name: open-webui
    ports:
      - "3000:8080"
    environment:
      - AUTOMATIC1111_BASE_URL=http://localhost:7860
      - ENABLE_IMAGE_GENERATION=True
      - AUDIO_OPENAI_API_BASE_URL=http://openedai-speech:8000/v1  # Updated to use service name
      - AUDIO_OPENAI_API_KEY=sk-111111111  # Add TTS environment variables here if needed
      - AUDIO_OPENAI_API_MODEL=tts-1
      - AUDIO_OPENAI_API_VOICE=alloy
    volumes:
      - open-webui:/app/backend/data
    restart: always
    extra_hosts:
      - "host.docker.internal:host-gateway"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['all']
              capabilities: [gpu]
    depends_on:
      - ollama
      - searxng
      - openedai-speech

  ollama:
    image: ollama/ollama
    container_name: ollama
    ports:
      - "11434:11434"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    healthcheck:
      test: ["CMD-SHELL", "ollama --version || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
    command: serve
    volumes:
      - ollama:/root/.ollama
    restart: always
    extra_hosts:
      - "host.docker.internal:host-gateway"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['all']
              capabilities: [gpu]

  searxng:
    image: searxng/searxng:latest
    container_name: searxng
    ports:
      - "8080:8080"
    environment:
      - ENABLE_RAG_WEB_SEARCH=True
      - RAG_WEB_SEARCH_ENGINE=searxng
      - RAG_WEB_SEARCH_RESULT_COUNT=3
      - RAG_WEB_SEARCH_CONCURRENT_REQUESTS=10
      - SEARXNG_QUERY_URL=http://host.docker.internal:8080/search?q=<query>
    healthcheck:
      test: ["CMD-SHELL", "wget -qO- http://searxng:8080/config | grep -o '\"version\": *\"[^\"]*\"' | awk -F '\"' '{print $4}' || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    volumes:
      - ./searxng:/etc/searxng
    restart: always
    extra_hosts:
      - "host.docker.internal:host-gateway"

  openedai-speech:
    image: ghcr.io/matatonic/openedai-speech
    container_name: openedai-speech
    env_file: /home/atte/ai-docker-compose/openedai-speech/.env
    environment:
      - AUDIO_OPENAI_API_BASE_URL=http://host.docker.internal:8000/v1
      - AUDIO_OPENAI_API_KEY=sk-111111111
      - AUDIO_OPENAI_API_MODEL=tts-1
      - AUDIO_OPENAI_API_VOICE=alloy
    ports:
      - "8000:8000"
    healthcheck:
      test: ["CMD-SHELL", "python /app/speech.py --help || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
    volumes:
      - tts-voices:/app/voices
      - tts-config:/app/config
    # labels:
    #   - "com.centurylinklabs.watchtower.enable=true"
    restart: always
    extra_hosts:
      - "host.docker.internal:host-gateway"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['all']
              capabilities: [gpu]

volumes:
  ollama:
  searxng:
  open-webui:
  tts-voices:
  tts-config:
matatonic commented 1 month ago

Can you share the voice_to_speaker.yaml ? it sounds like it was corrupted somehow.

matatonic commented 1 month ago

Alternately, if you're running the latest openedai-speech and are just using defaults you can delete it and it will be recreated.

atmaker commented 1 month ago

Alternately, if you're running the latest openedai-speech and are just using defaults you can delete it and it will be recreated.

I tried this. There are small changes in the logs, but in the bigger picture the error remains the same. This time I could reproduce the mysterious KeyError: 'English (America, New York City)+male1' on the last line of the OpenedAi Speech log.

voice_to_speaker.yaml

tts-1:
  some_other_voice_name_you_want:
    model: voices/choose your own model.onnx
    speaker: set your own speaker
  alloy:
    model: voices/en_US-libritts_r-medium.onnx
    speaker: 79 # 64, 79, 80, 101, 130
  echo:
    model: voices/en_US-libritts_r-medium.onnx
    speaker: 134 # 52, 102, 134
  echo-alt:
    model: voices/en_US-ryan-high.onnx
    speaker: # default speaker
  fable:
    model: voices/en_GB-northern_english_male-medium.onnx
    speaker: # default speaker
  onyx:
    model: voices/en_US-libritts_r-medium.onnx
    speaker: 159 # 55, 90, 132, 136, 137, 159
  nova:
    model: voices/en_US-libritts_r-medium.onnx
    speaker: 107 # 57, 61, 107, 150, 162
  shimmer:
    model: voices/en_US-libritts_r-medium.onnx
    speaker: 163
tts-1-hd:
  alloy:
    model: xtts
    speaker: voices/alloy-alt.wav
  alloy-orig: 
    model: xtts
    speaker: voices/alloy.wav # it's REALLY BAD
  echo:
    model: xtts
    speaker: voices/echo.wav
  fable:
    model: xtts
    speaker: voices/fable.wav
  onyx:
    model: xtts
    speaker: voices/onyx.wav
  nova:
    model: xtts
    speaker: voices/nova.wav
  shimmer:
    model: xtts
    speaker: voices/shimmer.wav
  me:
    model: xtts_v2.0.2 # you can specify different xtts version
    speaker: voices/me.wav # this could be you
  parler:
    model: parler-tts/parler_tts_mini_v0.1
    speaker: A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast.
  parler2:
    model: parler-tts/parler_tts_mini_v0.1
    speaker: A female voice with an Indian accent enunciates every word with precision. The speaker's voice is very close-sounding, and the recording is excellent, capturing her voice with crisp clarity.

open-webui.log openedai-speech.log

matatonic commented 1 month ago

wth... "English (America, New York City)+male1" where does that come from?? I need to add some more error handling and logging asap. I also just started using it with open-webui (I'm not seeing this error) but I am seeing other (harmless?) errors and I'm not sure which side is causing them yet.

Thanks for the report.

matatonic commented 1 month ago

I've update the logging and error handling, please update to 0.12.2 and try again. The "latest" image image may not be updated for you yet (at the time of writing this), but you can:

docker pull ghcr.io/matatonic/openedai-speech:0.12.2
eav-solution commented 1 month ago

Same error with 0.12.2 Failed to load resource: the server responded with a status of 400 (Bad Request)

eav-solution commented 1 month ago

![Uploading Screenshot 2024-06-17 at 22.32.35.png…]()

matatonic commented 1 month ago

I've fixed the problem with the :latest tag, if you're still having problems, please set this in your speech.env:

TTS_HOME=voices
HF_HOME=voices
OPENEDAI_LOG_LEVEL=DEBUG

ie. Please set the above environment variable (OPENEDAI_LOG_LEVEL=DEBUG) and restart the server and try again.

Please upload the resulting logs.

400 (Bad Request) should mean the request is bad (client side), what are the details?

matatonic commented 1 month ago

There is additional logging for 400 BadRequest errors without DEBUG in the logs now (0.12.3). This is likely a config or client side issue though, 400 errors only happen in a few cases of voice, model or input not found. It's could be a voice config problem (wrong voice name) or using the -min image with tts-1-hd and expecting tts-1-hd voices to work.

matatonic commented 1 month ago

It does seem like the 500 errors have stopped - correct?

eav-solution commented 1 month ago

Sorry, I solve this problem by change audio voice in settings. There are two setting for audio, one in admin setting and one for user setting, I change both and it worked. The problem come from open web ui, not this repo.

matatonic commented 1 month ago

Great, thanks for the feedback. I'll close this now, if anyone else is still having problems please leave a comment.