Open zhfeng opened 1 month ago
This is definitely interesting, but it's actually harder that it looks.
Specifically about:
I mean, the LLM might first use the tools it needs, and then start streaming its response back
this can't be done because until you get the entire response from the LLM, you don't know whether a tool execution is necessary or not (remember that a single query from the user can result in the invocation of multiple tools).
@zhfeng mind attaching the sample you used to surface the problem?
Sure, it is https://github.com/zhfeng/camel-ai-for-pnc
I added BaconTools
Okay, thanks
So this is specific to Ollama actually, for OpenAI for example this works fine.
@langchain4j I see that in upstream Ollama also doesn't support tool invocation for streaming responses. Is there a technical reason for this, or was it just a matter of priorities?
Thanks
@geoand good catch! I did not know it is not implemented for Ollama in streaming mode, let me create an issue for this
👍🏽
@zhfeng for the time being, Quarkus uses its own implementation for StreamingChatModel
in Ollama, so if you would like to contribute this feature, that would be great.
Essentially you need to implement
OllamaStreamingChatLanguageModel#generate(List<ChatMessage> messages, List<ToolSpecification> toolSpecifications,
StreamingResponseHandler<AiMessage> handler)
You could draw inspiration from how it's been done in WatsonxChatModel
Thanks @geoand and @langchain4j
Yeah, I will take a look.
🙏🏽
As far as I know, the latest version of Ollama doesn't support the use of the tools when the stream
mode is enabled.
You mean our integration or Ollama itself?
Ollama itself
Interesting, thanks
Now calling a function is not support in the streaming response, it throws an expception like
Qoute from Bruno Meseguer's comment
It could be greate if we can have tool support in the streaming reponse as well.