xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.24k stars 423 forks source link

It seems that this preserves the ReAct "thought" process, but this is not user-friendly. Is it possible to use the method of the official Qwen function call? #1685

Closed okwinds closed 2 days ago

okwinds commented 4 months ago
          It seems that this preserves the ReAct "thought" process, but this is not user-friendly. Is it possible to use the method of the official Qwen function call?

https://qwen.readthedocs.io/zh-cn/latest/framework/function_call.html

Originally posted by @GabrielXie in https://github.com/xorbitsai/inference/issues/1598#issuecomment-2159605576

https://github.com/xorbitsai/inference/pull/1598 推理引擎应该只做推理引擎该做的事情,对于用户体验是否友好的判断,这是业务领域的决策。官方的Qwen function call的方法,不应该隐藏react “thought”的推理过程。隐藏“thought”信息的逻辑,这在架构上,不应该放在模型层的推理引擎去帮业务做决定。这样的设计在分域治理的架构上是不可取的,希望官方谨慎考虑处理这样的推理逻辑。这可能会因为穿透性的给最终用户“良好体验”,反而给更多的推理业务带来风险和困扰。这在架构上应该解耦业务逻辑。

The inference engine should stick to its own duties, and the judgment of whether the user experience is friendly should be a decision made in the business domain. The official method of calling the Qwen function should not obscure the reasoning process of the "thought" in React. The logic of concealing "thought" information should not be incorporated into the model layer's inference engine to assist the business in making decisions. Such a design is undesirable in the architecture of domain-based governance, and it is hoped that the official will carefully consider the handling of such reasoning logic. This could potentially lead to a "good experience" for the end user through penetration, but in turn, it may bring risks and troubles to more inference operations.

This should decouple the business logic in the architecture.

zhanghx0905 commented 4 months ago

I think it will be useful to have a global variable to control whether the feature takes effect.

zhanghx0905 commented 4 months ago

take

okwinds commented 4 months ago

I think it will be useful to have a global variable to control whether the feature takes effect.

这也许是一个折中妥协的解决方案。基于你所提出的方案我能给出的建议是,默认显示react的“thought”内容,通过变量配置选择隐藏,以适配一些特殊需要。谢谢你的工作

Yep, this might be a compromise solution. Based on the proposal you have put forward, my suggestion is to display the "thought" content of React by default, and provide the option to hide it through variable configuration to accommodate some special needs. Thanks a lot for ur work.

okwinds commented 4 months ago

补充一下,关于qwen1.5的模型react prompt 隐藏“thought”推理过程,我做了一个demo,使用Langchain做react推理,要求两步调用工具,会出推理bug,而无法得到正确的结论。demo源代码见附件。 Demo bug复现源代码 qwreact.txt 更详细原因,请参考Langchain agent源代码 https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/agent.py https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/agents

截图调用效果,如下 qwen2,没有做特殊隐藏处理的代码,可以正常推理,调用两次工具,得出最终答案

Additionally, regarding the qwen1.5 model's 'react prompt' which hides the 'thought' reasoning process, I have created a demo using Langchain for 'react' reasoning. It requires two-step tool invocation and will result in a reasoning bug, preventing the correct conclusion from being reached. The demo source code can be found in the attachment. Demo bug reproduction source code qwreact.txt For a more detailed explanation, please refer to the Langchain agent source code at the following links: https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/agent.py https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/agents

Screenshot of the invocation effect is as follows: qwen2, without special hiding treatment, can reason normally, and by invoking the tool twice, the final answer is obtained.

image image

qwen1.5,做了特殊隐藏“thought”处理的代码,无法正常推理,无法调用工具,无法得出最终答案,如下: The code for qwen1.5, which has undergone special treatment to hide the "thought" process, is unable to reason properly, cannot invoke tools, and cannot arrive at the final answer.

image

如果换成ollama推理qwen1.5,因为没有隐藏“thought”的特殊逻辑,则可以正常执行agent得到结果,如下: If we switch to using the ollama to reason the qwen1.5 model, since there is no special logic for hiding 'thoughts', the agent can execute normally and obtain the result.

image
zhanghx0905 commented 4 months ago

补充一下,关于qwen1.5的模型react prompt 隐藏“thought”推理过程,我做了一个demo,使用Langchain做react推理,要求两步调用工具,会出推理bug,而无法得到正确的结论。demo源代码见附件。 Demo bug复现源代码 [qwreact.txt]

I'm pretty sure that there is no relationship between the so-called "hidden" thought process and whether or not the model invokes the tool, it's just a filter on the output.

As for the example you gave, LLM's tool calls are inherently unstable and unreliable. Changing the prompt might get the exact opposite effect.

okwinds commented 4 months ago

补充一下,关于qwen1.5的模型react prompt 隐藏“thought”推理过程,我做了一个demo,使用Langchain做react推理,要求两步调用工具,会出推理bug,而无法得到正确的结论。demo源代码见附件。 Demo bug复现源代码 [qwreact.txt]

I'm pretty sure that there is no relationship between the so-called "hidden" thought process and whether or not the model invokes the tool, it's just a filter on the output.我非常确定所谓的“隐藏”思维过程与模型是否调用该工具之间没有任何关系,它只是对输出的过滤器。

As for the example you gave, LLM's tool calls are inherently unstable and unreliable. Changing the prompt might get the exact opposite effect.至于你举的例子,LLM的工具调用本质上是不稳定和不可靠的。更改提示可能会产生完全相反的效果。

I've examined a portion of the source code and I'm uncertain if it covers everything. Could you confirm whether the code appears to have implemented "thought" filtering only for the "function calling" method, without addressing "thought" filtering for the "prompt react" approach?

JinCheng666 commented 4 months ago

mark.wait for final conclusion

zhanghx0905 commented 4 months ago

Thanks for that, I suddenly realized there are actually multiple scenarios to consider. For example, what kind of reasoning backend to use (vllm or not) and whether to enable stream response or not. My PR only considered the case where vllm is used and stream is enabled so it shouldn't be merged now.

You should double-check the reasoning backend and reasoning options you're using. From the screenshot you followed up with, it looks like the so-called hidden mechanism isn't actually working, otherwise you shouldn't be able to see "Final Answer:". I suspect that the library you're using (langchain) doesn't actually call the Tools interface. From my previous experience, qwen-agent does not use the tools interface. It implements its own tools mechanism on the client side.

okwinds commented 4 months ago

I've examined a portion of the source code and I'm uncertain if it covers everything. Could you confirm whether the code appears to have implemented "thought" filtering only for the "function calling" method, without addressing "thought" filtering for the "prompt react" approach?

You're precisely correct in stating that the essence of Langchain's React is not about direct tool invocation but rather hinges on the inference of prompts, followed by the processing of the model's responses to achieve an indirect tool invocation. As I mentioned in my previous response,

"I've examined a portion of the source code and I'm uncertain if it covers everything. Could you confirm whether the code appears to have implemented 'thought' filtering only for the 'function calling' method, without addressing 'thought' filtering for the 'prompt react' approach?"

Can you confirm the above content?

If this hidden “thought” action does not occur within the processing of the externally entered “prompt react”, then the error in the final result's reasoning is unrelated to the hidden "thought" information in the tool call. The occurrence of such a result is likely to be strongly correlated with the issues inherent in the model itself. I have found that qwen1.5-32B may have inference issues in its int4 performance. This could be a problem inherent to the model itself.

zhanghx0905 commented 4 months ago

Yeah. I've been using qwen32B int4 in my business for the last two months, and I've found that its tool calls are very unstable, different prompts and even different random number seeds affecting it.

I don't quite understand your question, what do you mean by addressing 'thought' filtering for the 'prompt react' approach, can you give me an example? I don't think xinference implements a similar feature.

okwinds commented 4 months ago

Yeah. I've been using qwen32B int4 in my business for the last two months, and I've found that its tool calls are very unstable, different prompts and even different random number seeds affecting it.

I don't quite understand your question, what do you mean by addressing 'thought' filtering for the 'prompt react' approach, can you give me an example? I don't think xinference implements a similar feature.

OK, so,

  1. Xinference has only filtered "Thought" information under the capability of "function calling" .
  2. Xinference has not performed "any information filtering" on the reasoning responses produced by "direct input prompt words".

    For points 1 and 2 above, is my understanding correct ?

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 7 days with no activity.