xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.9k stars 390 forks source link

请求Xinference代理的GLM4-9B,无法返回工具调用信息 #2328

Open readbyte-ai opened 6 days ago

readbyte-ai commented 6 days ago

System Info / 系統信息

cuda 12.1,transformers 4.44.0,vllm 0.5.4,Python 3.10,Ubuntu 22.04,conda 24.5.0

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

xinference 0.15.1

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local -H 0.0.0.0

Reproduction / 复现过程

1、通过Xinference提供的OpenAI API接口,目的是给GLM4-9B模型一个天气问题,希望LLM能回应已经绑定的工具调用信息。

import os
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.messages import HumanMessage

os.environ['TAVILY_API_KEY'] = 'TAVILY搜索提供的API-KEY'

model = ChatOpenAI(model="glm4-chat", openai_api_base="http://10.168.3.86:9997/v1", openai_api_key="EMPT")

search = TavilySearchResults(max_results=2)
search_results = search.invoke("what is the weather in SF")
print(search_results)
# If we want, we can create other tools.
# Once we have all the tools we want, we can put them in a list that we will reference later.
tools = [search]

response = model.invoke([HumanMessage(content="hi!")])
print(response.content)

#给模型绑定工具[search]
model_with_tools = model.bind_tools(tools)
#询问天气情况
response = model_with_tools.invoke([HumanMessage(content="What's the weather in SF?")])

print(f"ContentString: {response.content}")

#希望从模型回应中得到绑定工具信息
print(f"ToolCalls: {response.tool_calls}")

返回结果的ToolCalls为空,也就是GLM4-9B LLM并没有回应工具调用信息:

Hi 👋! Hello and welcome to ChatGLM, how can I help you today?
ContentString: {'id': '8e75c028-763d-11ef-af6c-08bfb8b931fb', 'object': 'text_completion', 'created': 1726719223, 'model': 'glm4-chat', 'choices': [{'text': "\nI'm sorry, but I can't provide real-time weather information. To know the current weather in San Francisco, you can check a weather website, use a weather app on your smartphone, or consult a weather forecast on a local news channel or online.", 'index': 0, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 14, 'completion_tokens': 53, 'total_tokens': 67}}
ToolCalls: []

当我改用GLM4官方工程的样例代码启动OpenAI服务(8000端口): https://github.com/THUDM/GLM-4/tree/main/basic_demo/openai_api_server.py

上述代码的ChatOpenAI模型构建也修改为此地址参数:

model = ChatOpenAI(model="glm-4", openai_api_base="http://10.168.3.86:8000/v1", openai_api_key="EMPT")

那么就能得到GLM4-9B回应的工具调用信息ToolCalls[]:

你好👋!很高兴见到你,有什么可以帮助你的吗?
ContentString: 
ToolCalls: [{'name': 'tavily_search_results_json', 'args': {'query': 'weather in San Francisco'}, 'id': 'call_rCX1iGOv2rbcQI664cUROS3N', 'type': 'tool_call'}]

Expected behavior / 期待表现

希望能修改此BUG,让Xinference通过OpenAI API代理GLM4-9B,也可以通过官方提供的样例服务一样,langchain绑定工具的时候,Xinference代理的GLM4-9B可以正常响应工具调用信息。

codingl2k1 commented 3 days ago

使用的是transformers后端还是vllm后端?

yuanzhiwei commented 3 days ago

使用的是transformers后端还是vllm后端?

我使用的vllm也出现这个问题。

readbyte-ai commented 3 days ago

使用的是transformers后端还是vllm后端?

用的是vllm引擎,transformers引擎执行此代码会报异常,无法执行。

codingl2k1 commented 2 days ago

vLLM 后端的 tool call 还没实现,transformers 后端应该是正常的,你使用 transformers 后端会有啥错误信息吗?

readbyte-ai commented 2 days ago

vLLM 后端的 tool call 还没实现,transformers 后端应该是正常的,你使用 transformers 后端会有啥错误信息吗?

明白了,我一直用VLLM,比较快,而且稳定,期待实现。

客户端的langchain使用的是0.3.0。Xinference服务端部署Transforms引擎(版本4.44.2)的glm4-chat。

客户端运行上述代码,Xinference服务端没有异常,Langchain客户端抛出异常信息如下:

Traceback (most recent call last):
  File "/home/fangshun/ai/exercise/langchain/test8.py", line 23, in <module>
    response = model_with_tools.invoke([HumanMessage(content="What's the weather in SF?")])
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 5313, in invoke
    return self.bound.invoke(
           ^^^^^^^^^^^^^^^^^^
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 286, in invoke
    self.generate_prompt(
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 786, in generate_prompt
    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 643, in generate
    raise e
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 633, in generate
    self._generate_with_cache(
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 855, in _generate_with_cache
    result = self._generate(
             ^^^^^^^^^^^^^^^
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_openai/chat_models/base.py", line 671, in _generate
    return self._create_chat_result(response, generation_info)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_openai/chat_models/base.py", line 708, in _create_chat_result
    message = _convert_dict_to_message(res["message"])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_openai/chat_models/base.py", line 127, in _convert_dict_to_message
    return AIMessage(
           ^^^^^^^^^^
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_core/messages/ai.py", line 94, in __init__
    super().__init__(content=content, **kwargs)
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_core/messages/base.py", line 75, in __init__
    super().__init__(content=content, **kwargs)
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 112, in __init__
    super().__init__(*args, **kwargs)
  File "/home/fangshun/miniconda3/envs/langchain/lib/python3.11/site-packages/pydantic/main.py", line 212, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for AIMessage
tool_calls.0.args
  Input should be a valid dictionary [type=dict_type, input_value='{"query": "weather in SF"}', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/dict_type