Closed cycwll closed 1 week ago
Hey @cycwll! I'm here to help you with any bugs, questions, or contributions you have for the Dify repository. Let's work together to solve any issues you're encountering.
To resolve the issue where the voice parameter is always empty when passed from Dify to xinference in your self-hosted Dify version 0.6.16 running on Docker, you need to ensure that the voice
parameter is correctly handled and passed through the relevant functions and API calls.
From the provided code, it appears that the XinferenceModelExtraParameter
class and related methods do not include a voice
parameter. You will need to modify the code to include this parameter where necessary.
Here is an example of how you can modify the XinferenceModelExtraParameter
class and the related methods to include the voice
parameter:
XinferenceModelExtraParameter
class to include the voice
parameter:class XinferenceModelExtraParameter:
model_format: str
model_handle_type: str
model_ability: list[str]
max_tokens: int = 512
context_length: int = 2048
support_function_call: bool = False
support_vision: bool = False
voice: str = "" # Add the voice parameter
def __init__(self, model_format: str, model_handle_type: str, model_ability: list[str],
support_function_call: bool, support_vision: bool, max_tokens: int, context_length: int, voice: str) -> None:
self.model_format = model_format
self.model_handle_type = model_handle_type
self.model_ability = model_ability
self.support_function_call = support_function_call
self.support_vision = support_vision
self.max_tokens = max_tokens
self.context_length = context_length
self.voice = voice # Initialize the voice parameter
_get_xinference_extra_parameter
method to extract the voice
parameter from the response:@staticmethod
def _get_xinference_extra_parameter(server_url: str, model_uid: str) -> XinferenceModelExtraParameter:
if not model_uid or not model_uid.strip() or not server_url or not server_url.strip():
raise RuntimeError('model_uid is empty')
url = str(URL(server_url) / 'v1' / 'models' / model_uid)
session = Session()
session.mount('http://', HTTPAdapter(max_retries=3))
session.mount('https://', HTTPAdapter(max_retries=3))
try:
response = session.get(url, timeout=10)
except (MissingSchema, ConnectionError, Timeout) as e:
raise RuntimeError(f'get xinference model extra parameter failed, url: {url}, error: {e}')
if response.status_code != 200:
raise RuntimeError(f'get xinference model extra parameter failed, status code: {response.status_code}, response: {response.text}')
response_json = response.json()
model_format = response_json.get('model_format', 'ggmlv3')
model_ability = response_json.get('model_ability', [])
voice = response_json.get('voice', '') # Extract the voice parameter
if response_json.get('model_type') == 'embedding':
model_handle_type = 'embedding'
elif model_format == 'ggmlv3' and 'chatglm' in response_json['model_name']:
model_handle_type = 'chatglm'
elif 'generate' in model_ability:
model_handle_type = 'generate'
elif 'chat' in model_ability:
model_handle_type = 'chat'
else:
raise NotImplementedError(f'xinference model handle type {model_handle_type} is not supported')
support_function_call = 'tools' in model_ability
support_vision = 'vision' in model_ability
max_tokens = response_json.get('max_tokens', 512)
context_length = response_json.get('context_length', 2048)
return XinferenceModelExtraParameter(
model_format=model_format,
model_handle_type=model_handle_type,
model_ability=model_ability,
support_function_call=support_function_call,
support_vision=support_vision,
max_tokens=max_tokens,
context_length=context_length,
voice=voice # Pass the voice parameter
)
Additionally, ensure that your Dify Docker deployment is correctly configured to access the Xinference endpoint. According to the Dify documentation, you need to set the server URL to the machine's IP address instead of localhost
:
使用 Dify Docker 部署方式的需要注意网络配置,确保 Dify 容器可以访问到 Xinference 的端点,Dify 容器内部无法访问到 localhost,需要使用宿主机 IP 地址。
By making these changes, you ensure that the voice
parameter is included in the XinferenceModelExtraParameter
class and is correctly extracted from the response and passed through the relevant methods. Additionally, ensure that your network settings allow Dify to access the Xinference endpoint [1][2][3].
To continue talking to Dosu, mention @dosu.
Xinference not tts model ?
My dify running by docker compose。The network from dify to xinference is not problem. The interface can be called and text can be converted to speech, but only the voice parameter has not been passed to xinference.
I see the log of xinference, the voice parameter still is empty. 2024-08-07 01:02:20,162 xinference.core.model 479 DEBUG Enter wrapped_func, args: (<xinference.core.model.ModelActor object at 0x7fdad4bc0a40>,), kwargs: {'input': '请联系技术支持团队获取帮助。', 'voice': '', 'response_format': 'mp3', 'speed': 1.0, 'stream': False}
@dosu thanks for you help!
I upgraded to v0.7.0, but the situation has become worse. The "voice" parameter has been became to "default", and causing the xinference error:
xinference.core.model 479 DEBUG Enter wrapped_func, args: (<xinference.core.model.ModelActor object at 0x7fdad4bc0a40>,), kwargs: {'input': '请联系技术支持团队获取帮助。', 'voice': 'default', 'response_format': 'mp3', 'speed': 1.0, 'stream': False}
"AssertionError: [address=0.0.0.0:43313, pid=467] Invalid voice default, CosyVoice available speakers: ['中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', '韩语女']"
To ensure that the correct "voice" parameter is passed to xinference in your self-hosted Dify setup (version 0.7.0, running on Docker), you need to make sure that the voice
parameter is correctly set and validated before invoking the text-to-speech model. Here is the relevant part of the code that ensures the correct voice
parameter is used:
def _tts_invoke_streaming(self, model: str, credentials: dict, content_text: str, voice: str) -> any:
"""
_tts_invoke_streaming text2speech model
:param model: model name
:param credentials: model credentials
:param content_text: text content to be translated
:param voice: model timbre
:return: text translated to audio file
"""
if credentials['server_url'].endswith('/'):
credentials['server_url'] = credentials['server_url'][:-1]
try:
handle = RESTfulAudioModelHandle(credentials['model_uid'], credentials['server_url'], auth_headers={})
model_support_voice = [x.get("value") for x in self.get_tts_model_voices(model=model, credentials=credentials)]
if not voice or voice not in model_support_voice:
voice = self._get_model_default_voice(model, credentials)
word_limit = self._get_model_word_limit(model, credentials)
if len(content_text) > word_limit:
sentences = self._split_text_into_sentences(content_text, max_length=word_limit)
executor = concurrent.futures.ThreadPoolExecutor(max_workers=min(3, len(sentences)))
futures = [executor.submit(
handle.speech,
input=sentences[i],
voice=voice,
response_format="mp3",
speed=1.0,
stream=False
) for i in range(len(sentences))]
for index, future in enumerate(futures):
response = future.result()
for i in range(0, len(response), 1024):
yield response[i:i + 1024]
else:
response = handle.speech(
input=content_text.strip(),
voice=voice,
response_format="mp3",
speed=1.0,
stream=False
)
for i in range(0, len(response), 1024):
yield response[i[i + 1024]
except Exception as ex:
raise InvokeBadRequestError(str(ex))
To avoid the error where the "voice" parameter is set to "default" instead of the specified value, ensure that:
voice
parameter is correctly passed to the _tts_invoke_streaming
method.voice
parameter is included in the list of supported voices for the model.If the voice
parameter is not specified or is not in the list of supported voices, the code defaults to the model's default voice:
if not voice or voice not in model_support_voice:
voice = self._get_model_default_voice(model, credentials)
Make sure that the voice
parameter you are passing is valid and supported by the model. You can check the supported voices using the get_tts_model_voices
method:
def get_tts_model_voices(self, model: str, credentials: dict, language: Optional[str] = None) -> list:
audio_model_name = credentials.get('audio_model_name', '__default')
for key, voices in self.model_voices.items():
if key in audio_model_name:
if language and language in voices:
return voices[language]
elif 'all' in voices:
return voices['all']
return self.model_voices['__default']['all']
By ensuring the voice
parameter is valid and supported, you can avoid the issue of it defaulting to "default" [1].
To continue talking to Dosu, mention @dosu.
@dosu. After adding the "function" in my application, I did not set the voice, and the default voice I used would be "voice": "default". And then I set the voice to "Chinese" - "Chinese male", but the problem goes back to the beginning and "voice" is empty {'intput': '记得联系技术支持寻求进一步的帮助。', 'voice': '', 'response_format': 'mp3', 'speed': 1.0, 'stream': False}
Additionally Please help me confirm if my following modifications are a valid verification method
Can you give me a temporary Xinference key to verify?
@ic-xu My xinference is deployed locally
文字转语音 自动播放 开启 在xinference控制台出现:
Invalid voice default, CosyVoice available speakers: ['中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', '韩语女']"
Should be fixed in #7958
Self Checks
Dify version
0.6.16
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
I used the cosyvoice model via xinference, dify use it as text to speech model. No matter what I configure the voice for Text to Speech, the value of the voice passed from dify to xinference is always empty
{'intput': '记得联系技术支持寻求进一步的帮助。', 'voice': '', 'response_format': 'mp3', 'speed': 1.0, 'stream': False}
✔️ Expected Behavior
No response
❌ Actual Behavior
No response