travisvn / openai-edge-tts

Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
https://tts.travisvn.com
GNU General Public License v3.0
128 stars 22 forks source link

Not working with Chinese text #1

Closed wstlxx closed 1 month ago

wstlxx commented 1 month ago

So my setup is tested within open web UI, working with English text for sure. when it comes to Chinese, it gives External: 500 Server Error: INTERNAL SERVER ERROR for url: http://myipaddress:5050/v1/audio/speech

Also, the reason why I didn't use localhost, it somehow did not point to the local tts service from openwebui, and I had to use the external public IP address instead which isn't a big issue and probably an openwebui side problem.

Other than that, it works like a charm.

travisvn commented 1 month ago

Hello @wstlxx

After seeing your opened issue, I tested the API endpoint with a non-Latin alphabet language and chose a sentence in Japanese.

It's possible you're getting the error due to encoding issues with how the call is made — I am not sure of your scenario. But when testing the API, running this curl command worked exactly as expected:

curl -X POST http://localhost:5050/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "model": "tts-1",
    "input": "じゃあ、行く。電車の時間、調べておくよ。",
    "voice": "ja-JP-KeitaNeural"
  }' \
  --output speech.mp3

Maybe you did not select a Chinese voice to accompany your request.

There are several Chinese voices which can be viewed by a few different methods. The easiest in my opinion is to install edge-tts on your machine and run edge-tts --list-voices.

However, I went ahead and revamped the API endpoints for this project to more easily return localized voices.

Here's the commit that includes all of the points I'm about to list below: https://github.com/travisvn/openai-edge-tts/commit/0ff53edcb59f1c129f98fb2cf76d38780a58c8f2

  1. There is now an option to forego requiring an API key. This allows for easier calls, as you can visit http://localhost:5050/v1/voices in your browser URL bar if you've set the environment variable to not require an API key.
  2. The /v1/voices endpoint now accepts and correctly filters voices when provided a language or locale attribute in the request (be it though URL search parameters or a JSON request body). You'll want to set locale to one of the zh- variants (zh-CN, zh-HK, or zh-TW)

I have not updated the README — I came here to let you know first.

On that note, I have not pushed this update to the Docker Hub, so you won't be able to use this updated version if you're running straight through with something like docker run. Follow the Python instructions in the README to be able to dynamically set things up and test it all out.

In the interim, you should be able to generate Chinese TTS if you set the voice properly. So selecting one of the voices below should solve your problem:

Name: zh-CN-XiaoxiaoNeural Gender: Female

Name: zh-CN-XiaoyiNeural Gender: Female

Name: zh-CN-YunjianNeural Gender: Male

Name: zh-CN-YunxiNeural Gender: Male

Name: zh-CN-YunxiaNeural Gender: Male

Name: zh-CN-YunyangNeural Gender: Male

Name: zh-CN-liaoning-XiaobeiNeural Gender: Female

Name: zh-CN-shaanxi-XiaoniNeural Gender: Female

Name: zh-HK-HiuGaaiNeural Gender: Female

Name: zh-HK-HiuMaanNeural Gender: Female

Name: zh-HK-WanLungNeural Gender: Male

Name: zh-TW-HsiaoChenNeural Gender: Female

Name: zh-TW-HsiaoYuNeural Gender: Female

Name: zh-TW-YunJheNeural Gender: Male

-Travis