quarkiverse / quarkus-langchain4j

Quarkus Langchain4j extension
https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html
Apache License 2.0
149 stars 89 forks source link

Tool support in the streaming response #988

Open zhfeng opened 1 month ago

zhfeng commented 1 month ago

Now calling a function is not support in the streaming response, it throws an expception like

2024-10-15 16:55:38,809 ERROR [io.qua.web.nex.run.WebSocketEndpointBase] (executor-thread-1) Unable to send text message from Multi: WebSocket connection [endpointId=io.quarkiverse.langchain4j.sample.chatbot.ChatBotWebSocket, path=/chatbot, id=362059f0-2b79-4691-9a42-7cae9c730be3] : java.lang.IllegalArgumentException: Tools are currently not supported by this model
    at dev.langchain4j.model.chat.StreamingChatLanguageModel.generate(StreamingChatLanguageModel.java:61)
    at dev.langchain4j.model.chat.StreamingChatLanguageModel_FwyQP9Of9oZwwZhlQz1A4k1Ak7I_Synthetic_ClientProxy.generate(Unknown Source)
    at dev.langchain4j.service.AiServiceTokenStream.start(AiServiceTokenStream.java:116)
    at io.quarkiverse.langchain4j.runtime.aiservice.AiServiceMethodImplementationSupport$MultiEmitterConsumer.accept(AiServiceMethodImplementationSupport.java:713)
    at io.quarkiverse.langchain4j.runtime.aiservice.AiServiceMethodImplementationSupport$MultiEmitterConsumer.accept(AiServiceMethodImplementationSupport.java:680)
    at io.smallrye.context.impl.wrappers.SlowContextualConsumer.accept(SlowContextualConsumer.java:21)
    at io.smallrye.mutiny.operators.multi.builders.EmitterBasedMulti.subscribe(EmitterBasedMulti.java:67)
    at io.smallrye.mutiny.operators.AbstractMulti.subscribe(AbstractMulti.java:60)
    at io.smallrye.mutiny.operators.multi.MultiFlatMapOp$FlatMapMainSubscriber.onItem(MultiFlatMapOp.java:182)
    at io.smallrye.mutiny.operators.multi.builders.EmitterBasedMulti$DropLatestOnOverflowMultiEmitter.drain(EmitterBasedMulti.java:220)
    at io.smallrye.mutiny.operators.multi.builders.EmitterBasedMulti$DropLatestOnOverflowMultiEmitter.emit(EmitterBasedMulti.java:153)
    at io.smallrye.mutiny.operators.multi.builders.SerializedMultiEmitter.onItem(SerializedMultiEmitter.java:50)
    at io.smallrye.mutiny.operators.multi.builders.SerializedMultiEmitter.emit(SerializedMultiEmitter.java:140)
    at io.smallrye.mutiny.groups.MultiCreate.lambda$completionStage$2(MultiCreate.java:128)
    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1773)
    at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:635)
    at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2516)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2495)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1521)
    at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11)
    at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:840)

Qoute from Bruno Meseguer's comment

I guess there are no rules on when the stream back starts... I mean, the LLM might first use the tools it needs, and then start streaming its response back

It could be greate if we can have tool support in the streaming reponse as well.

geoand commented 1 month ago

This is definitely interesting, but it's actually harder that it looks.

Specifically about:

I mean, the LLM might first use the tools it needs, and then start streaming its response back

this can't be done because until you get the entire response from the LLM, you don't know whether a tool execution is necessary or not (remember that a single query from the user can result in the invocation of multiple tools).

geoand commented 1 month ago

@zhfeng mind attaching the sample you used to surface the problem?

zhfeng commented 1 month ago

Sure, it is https://github.com/zhfeng/camel-ai-for-pnc

I added BaconTools

geoand commented 1 month ago

Okay, thanks

geoand commented 1 month ago

So this is specific to Ollama actually, for OpenAI for example this works fine.

@langchain4j I see that in upstream Ollama also doesn't support tool invocation for streaming responses. Is there a technical reason for this, or was it just a matter of priorities?

Thanks

langchain4j commented 1 month ago

@geoand good catch! I did not know it is not implemented for Ollama in streaming mode, let me create an issue for this

geoand commented 1 month ago

👍🏽

langchain4j commented 1 month ago

Issue: https://github.com/langchain4j/langchain4j/issues/1971

geoand commented 1 month ago

@zhfeng for the time being, Quarkus uses its own implementation for StreamingChatModel in Ollama, so if you would like to contribute this feature, that would be great. Essentially you need to implement

OllamaStreamingChatLanguageModel#generate(List<ChatMessage> messages, List<ToolSpecification> toolSpecifications,
            StreamingResponseHandler<AiMessage> handler)

You could draw inspiration from how it's been done in WatsonxChatModel

zhfeng commented 1 month ago

Thanks @geoand and @langchain4j

Yeah, I will take a look.

geoand commented 1 month ago

🙏🏽

andreadimaio commented 4 weeks ago

As far as I know, the latest version of Ollama doesn't support the use of the tools when the stream mode is enabled.

geoand commented 4 weeks ago

You mean our integration or Ollama itself?

andreadimaio commented 4 weeks ago

Ollama itself

geoand commented 4 weeks ago

Interesting, thanks