The LLM request streaming between the Dify and Xinference platforms has not been properly integrated

wuminghui-coder commented 2 months ago

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.11

Cloud or Self Hosted

Cloud

Steps to reproduce

dify跟xinference平台没有完全对接好，dify对xinference下的llm模型，不支持流式请求，流式工具调用也不支持，我自己加了打印，以及阅读了一下代码，这一块没有完全适配

218fa5d4446f7e5adc92ad5069cc293

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 2 months ago

To enable support for streaming requests and tool calls for LLM models under Xinference in Dify version 0.6.11, you need to set the appropriate flags in the credentials. Specifically, you need to set support_function_call to True for tool calls and ensure that stream is set to True for streaming requests.

Here is the relevant code snippet that shows how these flags are used:

support_function_call = credentials.get('support_function_call', False)
if support_function_call:
    features.append(ModelFeature.TOOL_CALL)

# ...

def _generate(self, model: str, credentials: dict, prompt_messages: list[PromptMessage],
              model_parameters: dict, extra_model_kwargs: XinferenceModelExtraParameter,
              tools: list[PromptMessageTool] | None = None,
              stop: list[str] | None = None, stream: bool = True, user: str | None = None) \
        -> LLMResult | Generator:
    # ...

    if tools and len(tools) > 0:
        generate_config['tools'] = [
            {
                'type': 'function',
                'function': helper.dump_model(tool)
            } for tool in tools
        ]

    # ...

    if stream:
        if tools and len(tools) > 0:
            raise InvokeBadRequestError('xinference tool calls does not support stream mode')
        return self._handle_chat_stream_response(model=model, credentials=credentials,
                                                 prompt_messages=prompt_messages,
                                                 tools=tools, resp=resp)
    return self._handle_chat_generate_response(model=model, credentials=credentials,
                                               prompt_messages=prompt_messages,
                                               tools=tools, resp=resp)

To enable these features, you should configure your credentials as follows:

credentials = {
    'server_url': 'your_server_url',
    'model_uid': 'your_model_uid',
    'support_function_call': True,  # Enable tool calls
    'context_length': 2048,
    # other necessary credentials
}

# Ensure stream is set to True when invoking the model
result = xinference_model._generate(
    model='your_model',
    credentials=credentials,
    prompt_messages=your_prompt_messages,
    model_parameters=your_model_parameters,
    tools=your_tools,  # List of tools if any
    stop=your_stop_conditions,
    stream=True,  # Enable streaming
    user='your_user_id'
)

This configuration will enable support for both streaming requests and tool calls for LLM models under Xinference in Dify version 0.6.11.

However, please note that function calling of Xinference does not support stream mode currently, as indicated in the integration tests:

"""
Function calling of xinference does not support stream mode currently
"""
# def test_invoke_stream_chat_model_with_functions():
#     model = XinferenceAILargeLanguageModel()
#     ...

Therefore, while you can enable streaming or tool calls individually, combining them is not supported at this time ^[1]^[2].

To continue talking to Dosu, mention @dosu.

wuminghui-coder commented 2 months ago

关闭

dosubot[bot] commented 2 months ago

It looks like you want to close the issue regarding the integration of LLM request streaming between the Dify and Xinference platforms. If you have any further questions or need additional assistance in the future, feel free to reach out. Thank you!

To continue talking to Dosu, mention @dosu.

langgenius / dify