Open zscxc opened 2 days ago
Hello, @zscxc! I'm here to help you with any bugs, questions, or contributions you might have. Ask me anything! 😊
The issue you're experiencing with the timbre switching for the CosyVoice-300M-Instruct model might be related to how the payload is constructed and sent to the API. The logic for handling different voice types is implemented in the _build_tts_payload
method. This method constructs the payload based on the model_type
parameter, which can be one of the following: PresetVoice
, CloneVoice
, CloneVoice_CrossLingual
, or InstructVoice
.
Here is the relevant code snippet:
def _build_tts_payload(
self,
model_type: str,
content_text: str,
model_role: str,
prompt_text: str,
prompt_audio: str,
instruct_text: str,
):
if model_type == TTSModelType.PresetVoice.value and model_role:
return {"tts_text": content_text, "role": model_role}
if model_type == TTSModelType.CloneVoice.value and prompt_text and prompt_audio:
return {"tts_text": content_text, "prompt_text": prompt_text, "prompt_audio": prompt_audio}
if model_type == TTSModelType.CloneVoice_CrossLingual.value and prompt_audio:
lang_tag = self._detect_lang_code(content_text)
return {"tts_text": f"{content_text}", "prompt_audio": prompt_audio, "lang_tag": lang_tag}
if model_type == TTSModelType.InstructVoice.value and instruct_text and model_role:
return {"tts_text": content_text, "role": model_role, "instruct_text": instruct_text}
raise RuntimeError(f"Invalid params for {model_type}")
This method is called within the _invoke
method to generate the appropriate payload for the text-to-speech request [1].
To troubleshoot the issue, you can check the following:
model_type
parameter is correctly set to the desired voice type.model_role
, prompt_text
, prompt_audio
, instruct_text
) are correctly provided based on the selected model_type
.If the parameters are correctly set and the issue persists, there might be a bug in the implementation or a misconfiguration in the model settings. You may need to review the configuration files and ensure that the model is correctly initialized and configured to handle the specified voice types.
To continue talking to Dosu, mention @dosu.
Should I implement this timbre logic in dify
The timbre switching logic is already implemented in Dify within the _build_tts_payload
method. This method constructs the payload based on the model_type
parameter and other related parameters like model_role
, prompt_text
, prompt_audio
, and instruct_text
[1].
To ensure the timbre switching works correctly, you should:
model_type
parameter is set to the correct voice type (e.g., PresetVoice
, CloneVoice
, CloneVoice_CrossLingual
, or InstructVoice
).model_role
, prompt_text
, prompt_audio
, instruct_text
) are correctly provided based on the selected model_type
.If these parameters are correctly set and the issue persists, it might be due to a bug or misconfiguration. You should review the configuration files and ensure the model is correctly initialized and configured to handle the specified voice types. The configuration file sagemaker_tts.yaml
specifies the available voice types and their corresponding parameters [2].
If everything seems correct and the issue still exists, you might need to debug the payload construction and API request handling to identify any discrepancies.
To continue talking to Dosu, mention @dosu.
Self Checks
Dify version
0.8.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Open and use the male voice, but click to play is not a male voice but a female voice, and this switch is still a Chinese female voice, is there any place can be adjusted. The model I used was CosyVoice-300M-Instruct
✔️ Expected Behavior
I want to be able to adjust the sound when calling through the API
❌ Actual Behavior
No matter how you switch the timbre, it doesn't work