xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
3.52k stars 293 forks source link

BUG - Using xinference to reason the qwen1.5 model, the react prompt that hides the "thought" reasoning process leads to a reasoning bug, which prevents the upper-level application layers of the reasoning engine from using the agent normally. #1691

Open okwinds opened 1 week ago

okwinds commented 1 week ago

背景上下文: https://qwen.readthedocs.io/zh-cn/latest/framework/function_call.html Originally posted by @GabrielXie in https://github.com/xorbitsai/inference/pull/1598#issuecomment-2159605576

https://github.com/xorbitsai/inference/issues/1685

BUG 阐述:

关于xinference 针对qwen1.5的模型react prompt 隐藏“thought”推理过程,我做了一个demo,使用Langchain做react推理,要求两步调用工具,会出推理bug,而无法得到正确的结论。 第一步,用户发出请求 第二步,模型思考,推理要调用搜索引擎 第三步,模型思考,通过调用搜索引擎得到信息,提取信息以后,需要调用工具,进一步转换信息 第四步,模型根据两步调用工具,得到最终的Final Answer

demo源代码见附件。 Demo bug复现源代码 qwreact.txt 更详细原因,请参考阅读Langchain agent源代码,即可知晓为什么推理引擎隐藏thought会导致业务层推理出现bug https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/agent.py https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/agents

截图调用效果,如下 qwen2,没有做特殊隐藏处理的代码,可以正常推理,调用两次工具,得出最终答案

Title: BUG Report on the Inference Process of the Model for "qwen1.5" with Hidden "Thought" Reasoning Prompt in Xinference

Description:

I have developed a demo that utilizes Langchain for the reasoning process of the "react prompt" in the "qwen1.5" model. The demo requires two steps of tool invocation, which results in a reasoning bug and fails to reach the correct conclusion.

Step 1: The user issues a request. Step 2: The model thinks and reasons, and it needs to invoke a search engine. Step 3: The model thinks again, and after obtaining information through the search engine, it needs to invoke a tool to further transform the information. Step 4: The model gets the final answer based on the two steps of tool invocation.

The demo source code is attached. Demo bug reproduction source code qwreact.txt For a more detailed reason, please refer to the Langchain agent source code to understand why the reasoning engine hiding "thought" can lead to bugs in the business layer reasoning. https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/agent.py https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/agents

The screenshot of the invocation effect is as follows: qwen2, which does not perform special hiding processing, can reason normally, invoke the tool twice, and reach the final answer.

image image

qwen1.5,做了特殊隐藏“thought”处理的代码,无法正常推理,无法调用工具,无法得出最终答案,如下: The code for qwen1.5, which has undergone special treatment to hide the "thought" process, is unable to reason properly, cannot invoke tools, and cannot arrive at the final answer.

image

如果换成ollama推理qwen1.5,因为没有隐藏“thought”的特殊逻辑,则可以正常执行agent得到结果,如下: If we switch to using the ollama to reason the qwen1.5 model, since there is no special logic for hiding 'thoughts', the agent can execute normally and obtain the result.

image

Originally posted by @okwinds in https://github.com/xorbitsai/inference/issues/1685#issuecomment-2183214339

JinCheng666 commented 1 week ago

mark