Open kingdomad opened 5 days ago
debug了下,在xinferece.model.llm.vllm.core.VLLMChatModel的async_chat方法内出现了逻辑错误,前置处理时把glm4的tools丢弃了,但是后置处理时又会进行tools的解析。
async def async_chat(
self,
messages: List[Dict],
generate_config: Optional[Dict] = None,
request_id: Optional[str] = None,
) -> Union[ChatCompletion, AsyncGenerator[ChatCompletionChunk, None]]:
tools = generate_config.pop("tools", []) if generate_config else None
model_family = self.model_family.model_family or self.model_family.model_name
full_context_kwargs = {}
if tools and model_family in QWEN_TOOL_CALL_FAMILY:
full_context_kwargs["tools"] = tools
assert self.model_family.chat_template is not None
full_prompt = self.get_full_context(
messages, self.model_family.chat_template, **full_context_kwargs
)
generate_config = self._sanitize_chat_config(generate_config)
stream = generate_config.get("stream", None)
if stream:
agen = await self.async_generate(
full_prompt, generate_config, tools, request_id=request_id
)
assert isinstance(agen, AsyncGenerator)
if tools:
return self._async_to_tool_completion_chunks(agen)
return self._async_to_chat_completion_chunks(agen)
else:
c = await self.async_generate(
full_prompt, generate_config, request_id=request_id
)
assert not isinstance(c, AsyncGenerator)
if tools:
return self._tool_calls_completion(self.model_family, self.model_uid, c)
return self._to_chat_completion(c)
System Info / 系統信息
ubuntu 22.04
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
0.15.0
The command used to start Xinference / 用以启动 xinference 的命令
xinference-local --host 0.0.0.0 --port 9997
Reproduction / 复现过程
启动命令
推理请求
得到的回复:
后台的日志:
Expected behavior / 期待表现
content的格式应该正确