huggingface / speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o
Apache License 2.0
3.57k stars 377 forks source link

Chinese is not recognized with the default model #79

Open luobendewugong opened 2 months ago

luobendewugong commented 2 months ago

Hello, thanks for your work, I have two problems with the default model and would like to consult you:

  1. Can only recognize English, where do I need to set it to recognize Chinese? Or do I need to replace it with another model
  2. The recognition effect is not good, and the pronunciation of the answer is not continuous, do I need to replace other TTS models? The default model actually quite big

Thank you so much!

Lbaiall commented 2 months ago

you should change model from HF

andimarafioti commented 2 months ago

Using the code from this PR: https://github.com/huggingface/speech-to-speech/pull/60

You can call the system with: python s2s_pipeline.py --recv_host 0.0.0.0 --send_host 0.0.0.0 --lm_model_name meta-llama/Meta-Llama-3.1-8B-Instruct --init_chat_role system --tts melo --stt_model_name openai/whisper-large-v3 --language zh

andimarafioti commented 2 months ago

There, whisper is larger than the distil version (but it works for Chinese). The llm is larger (but it works for Chinese, you can change it for another one). The TTS is smaller than the default (and it works for chinese)

andimarafioti commented 2 months ago

Let me know if it works :)

andimarafioti commented 2 months ago

I merged the PR for multiple language, so you should be able to run with the code in main

Kong4Git commented 2 months ago

Hi, thanks for your work,I encountered an issue while running the code from your repository on my Mac. The error I received is as follows:【ValueError: Please select a valid model】,

The error occurs when initializing the LightningWhisperMLX model with the following command: python s2s_pipeline.py --local_mac_optimal_settings --device mps --lm_model_name meta-llama/Meta-Llama-3.1-8B-Instruct --init_chat_role system --tts melo --stt_model_name openai/whisper-large-v3 --language zh

could you please provide some guidance on what might be causing this issue or suggest any potential solutions?

Thank you very much for your help!

andimarafioti commented 2 months ago

You can run it on mac with:

python s2s_pipeline.py  --device mps --lm_model_name meta-llama/Meta-Llama-3.1-8B-Instruct --init_chat_role system --tts melo --stt_model_name openai/whisper-large-v3 --language zh --mode local

But we still didn't make the changes to the MLX classes to support Chinese, so the generation will be quite slow.

andimarafioti commented 2 months ago

If you want to make the changes, we welcome PRs! otherwise I'll adapt it in the coming days

luobendewugong commented 2 months ago

Thank you very much, I can use Chinese now, but I feel that there are three more things I want to ask:

  1. Modifying 'init_chat_prompt' in 'LLM\language_model.py' doesn't seem to have any effect, no matter how I modify it, LLM there is no change in the answer;
  2. '--language None' seems to be related to the language model, I use qwen2-1.5b when running the output without Language None mode, have to choose the language;
  3. Can I use a model in GGUF format?

Using the code from this PR: #60

You can call the system with: python s2s_pipeline.py --recv_host 0.0.0.0 --send_host 0.0.0.0 --lm_model_name meta-llama/Meta-Llama-3.1-8B-Instruct --init_chat_role system --tts melo --stt_model_name openai/whisper-large-v3 --language zh

andimarafioti commented 2 months ago
  1. For the init_chat_prompt to take effect, you also need to set init_chat_role
  2. We changed it to '--language auto' because we thought it was more intuitive. In any case, it's related to everything. Setting it to 'language auto' makes everything auto, setting it to 'language zh' should make everything chinese.
  3. Do you mean for the LLM? I think you should be able to. Try it out and report back to me !