Closed wstlxx closed 1 month ago
Hello @wstlxx
After seeing your opened issue, I tested the API endpoint with a non-Latin alphabet language and chose a sentence in Japanese.
It's possible you're getting the error due to encoding issues with how the call is made — I am not sure of your scenario. But when testing the API, running this curl command worked exactly as expected:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"model": "tts-1",
"input": "じゃあ、行く。電車の時間、調べておくよ。",
"voice": "ja-JP-KeitaNeural"
}' \
--output speech.mp3
Maybe you did not select a Chinese voice to accompany your request.
There are several Chinese voices which can be viewed by a few different methods. The easiest in my opinion is to install edge-tts
on your machine and run edge-tts --list-voices
.
However, I went ahead and revamped the API endpoints for this project to more easily return localized voices.
Here's the commit that includes all of the points I'm about to list below: https://github.com/travisvn/openai-edge-tts/commit/0ff53edcb59f1c129f98fb2cf76d38780a58c8f2
http://localhost:5050/v1/voices
in your browser URL bar if you've set the environment variable to not require an API key. /v1/voices
endpoint now accepts and correctly filters voices when provided a language
or locale
attribute in the request (be it though URL search parameters or a JSON request body). You'll want to set locale to one of the zh-
variants (zh-CN
, zh-HK
, or zh-TW
)I have not updated the README — I came here to let you know first.
On that note, I have not pushed this update to the Docker Hub, so you won't be able to use this updated version if you're running straight through with something like docker run
. Follow the Python instructions in the README to be able to dynamically set things up and test it all out.
In the interim, you should be able to generate Chinese TTS if you set the voice properly. So selecting one of the voices below should solve your problem:
Name: zh-CN-XiaoxiaoNeural Gender: Female
Name: zh-CN-XiaoyiNeural Gender: Female
Name: zh-CN-YunjianNeural Gender: Male
Name: zh-CN-YunxiNeural Gender: Male
Name: zh-CN-YunxiaNeural Gender: Male
Name: zh-CN-YunyangNeural Gender: Male
Name: zh-CN-liaoning-XiaobeiNeural Gender: Female
Name: zh-CN-shaanxi-XiaoniNeural Gender: Female
Name: zh-HK-HiuGaaiNeural Gender: Female
Name: zh-HK-HiuMaanNeural Gender: Female
Name: zh-HK-WanLungNeural Gender: Male
Name: zh-TW-HsiaoChenNeural Gender: Female
Name: zh-TW-HsiaoYuNeural Gender: Female
Name: zh-TW-YunJheNeural Gender: Male
-Travis
So my setup is tested within open web UI, working with English text for sure. when it comes to Chinese, it gives External: 500 Server Error: INTERNAL SERVER ERROR for url: http://myipaddress:5050/v1/audio/speech
Also, the reason why I didn't use localhost, it somehow did not point to the local tts service from openwebui, and I had to use the external public IP address instead which isn't a big issue and probably an openwebui side problem.
Other than that, it works like a charm.