matatonic / openedai-speech

An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.
GNU Affero General Public License v3.0
479 stars 63 forks source link

When using the API to read Chinese, the return is nonetype #64

Open luobendewugong opened 2 months ago

luobendewugong commented 2 months ago

Hello, I very appreciate your work. I have deployed it using Ubuntu and tried to read Chinese.

I downloaded zh_CN-huayan-medium.onnx and zh_CN-huayan-x_low.onnx from https://hf-mirror.com/rhasspy/piper-voices/tree/main/zh/zh_CN/huayan/medium and https://hf-mirror.com/rhasspy/piper-voices/tree/main/zh/zh_CN/huayan/x_low, and placed them in the voices folder.

I downloaded config.json and model.pth from https://hf-mirror.com/coqui/XTTS-v2/tree/main and placed them in the .local\share\tts\tts_models--multilingual--multi-dataset--xtts folder.

After running python speech.py, the following error occurred, and I suspect it is because the text to be read has not been inputed.

Could you kindly help me, thank you!

bug
matatonic commented 2 months ago

The error seems to indicate that the tokenizer is missing, so perhaps you missed a file?

Secondly, why are you manually downloading the models? They can auto download as needed. Maybe you have a good reason, I know downloads from huggingface can be blocked in some area. Without a good reason though, you're just making things harder.

matatonic commented 2 months ago

for xtts, the folder path should be like this:

openedai-speech/voices/tts$ ls tts_models--multilingual--multi-dataset--xtts/
config.json  vocab.json  hash.md5  model.pth  speakers_xtts.pth
luobendewugong commented 2 months ago

The error seems to indicate that the tokenizer is missing, so perhaps you missed a file?

Secondly, why are you manually downloading the models? They can auto download as needed. Maybe you have a good reason, I know downloads from huggingface can be blocked in some area. Without a good reason though, you're just making things harder.

Thank you very much for your reply, as I am unable to access https://huggingface.co/, I tried adding these at the beginning of speech.py:

import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

But it didn't work, for some reason, my other programs can work this way. In the end, there was no other way but to manually download the model. I also tried using docker but still faced the problem of downloading the model.

luobendewugong commented 2 months ago

for xtts, the folder path should be like this:

openedai-speech/voices/tts$ ls tts_models--multilingual--multi-dataset--xtts/
config.json  vocab.json  hash.md5  model.pth  speakers_xtts.pth

I did indeed not download all the files. I only downloaded config.json and model.pth. Thank you very much for your detailed explanation. I'll try again.

Simultaneously, I have also added two questions:

  1. No matter how I set it, even if I set the model to xtts_v2.0.2, after one use, when performing TTS, the model will revert back to xtts. Is it a problem with my version setting for xtts, should it be set to xtts_v2.0.2 or xtts_v2? Where do I need to make these settings?
  2. What should be the folder path for xtts_v2? Which files should it include?

Thank you very much for your reply!

luobendewugong commented 2 months ago

for xtts, the folder path should be like this:

openedai-speech/voices/tts$ ls tts_models--multilingual--multi-dataset--xtts/
config.json  vocab.json  hash.md5  model.pth  speakers_xtts.pth

On https://hf-mirror.com/coqui/XTTS-v1/tree/main, it seems that there are no hash.md5 and speakers_xtts.pth files. These two files should not be necessary, right? When problems arise, I have downloaded the other three files and placed them in the file directory.

屏幕截图 2024-09-21 082423
matatonic commented 2 months ago

You want coqui Xttsv2

Screenshot_20240920-210952.png

matatonic commented 2 months ago

I'll look into why

HF_ENDPOINT=https://hf-mirror.com

doesn't work and see if I can fix it.

matatonic commented 2 months ago

Simultaneously, I have also added two questions:

  1. No matter how I set it, even if I set the model to xtts_v2.0.2, after one use, when performing TTS, the model will revert back to xtts. Is it a problem with my version setting for xtts, should it be set to xtts_v2.0.2 or xtts_v2? Where do I need to make these settings?

without setting a version, 'xtts' will use the latest version, which is xtts_v2.0.2.

  1. What should be the folder path for xtts_v2? Which files should it include?

the folder I mentioned at the beginning, sorry I'm on mobile, I can be more detailed if needed.

luobendewugong commented 2 months ago

Your explanation has helped me a lot, thank you very much! After I put all the files into tts_models--multilingual--multi-dataset--xtts, the previous issues were resolved, but the following problems have arisen:

INFO:     127.0.0.1:46278 - "POST /v1/audio/speech HTTP/1.1" 200 OK
2024/09/21 13:12:36.681500 cmd_run.go:1138: WARNING: cannot start document portal: dial unix /run/user/1000/bus: connect: no such file or directory

Additionally, for some reason, Loading model xtts to cuda is very slow, taking about 5 minutes.

Thank you very much for your reply!

luobendewugong commented 2 months ago

I reinstall the ffmpeg, and it runs smoothly! Thank you very much! But it seems that it cannot read in a mix of Chinese and English.

For some reason, Loading model xtts to cuda is very slow, taking about 5 minutes.

matatonic commented 2 months ago

Try the dev branch, which supports multilingual at the request level. is it a desirable feature to support multilingual at the sentence level?

matatonic commented 2 months ago

re: 5 minutes wait, that is odd, which GPU? models are loaded on demand by default.

zhai-hello commented 1 month ago

Hello, I very appreciate your work. I have deployed it using Ubuntu and tried to read Chinese.

I downloaded zh_CN-huayan-medium.onnx and zh_CN-huayan-x_low.onnx from https://hf-mirror.com/rhasspy/piper-voices/tree/main/zh/zh_CN/huayan/medium and https://hf-mirror.com/rhasspy/piper-voices/tree/main/zh/zh_CN/huayan/x_low, and placed them in the voices folder.

I downloaded config.json and model.pth from https://hf-mirror.com/coqui/XTTS-v2/tree/main and placed them in the .local\share\tts\tts_models--multilingual--multi-dataset--xtts folder.

After running python speech.py, the following error occurred, and I suspect it is because the text to be read has not been inputed.

Could you kindly help me, thank you!

bug

老乡,可以麻烦您,帮忙指导一下具体怎么操作嘛?我也是用open webui然后用这个项目转换语音的,但是,根据官方部署,一直弄不了,可以看我一下我的问题。 麻烦了。https://github.com/matatonic/openedai-speech/issues/66