quarkiverse / quarkus-langchain4j

Quarkus Langchain4j extension
https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html
Apache License 2.0
119 stars 65 forks source link

Support streaming in dev-ui chat #590

Closed iocanel closed 3 weeks ago

iocanel commented 1 month ago

The pull request introduces a new option (streaming chat). The option is enabled when model supports streaming chat model and rag is disabled (streaming chat can't work with rag atm). When user enables streaming chat. Responses from the model are streamed to the devui chat screen.

jmartisk commented 1 month ago

I'm at a conference so I'll have time to test it later, but I'll just throw in that, in case of a timeout, or whatever other error from the LLM, are we sure that we don't get the history in the Dev UI desynchronized with the one stored in the actual chat memory? That's why the original synchronous variant of the chat method always returns the whole history - so the Dev UI can replace the whole history by that, and prevent desynchronization. I think this might be a little tricky to implement robustly when you just stream a single message.

iocanel commented 1 month ago

I'm at a conference so I'll have time to test it later, but I'll just throw in that, in case of a timeout, or whatever other error from the LLM, are we sure that we don't get the history in the Dev UI desynchronized with the one stored in the actual chat memory? That's why the original synchronous variant of the chat method always returns the whole history - so the Dev UI can replace the whole history by that, and prevent desynchronization. I think this might be a little tricky to implement robustly when you just stream a single message.

The memory gets modified when onComplete is called, with the whole message. In case of an error the memory is restored from the backup in a way similar to regular chat. Do you think we need to do more?

geoand commented 1 month ago

I plan on going to do a 0.14.0 today, so we can leave this for 0.14.1 once @jmartisk gives his final approval

jmartisk commented 1 month ago

The memory gets modified when onComplete is called, with the whole message. In case of an error the memory is restored from the backup in a way similar to regular chat. Do you think we need to do more?

I'm trying to cause an error by setting a short timeout (I used ollama and set quarkus.langchain4j.ollama.timeout=1s). Then after I ask the LLM for something, Quarkus log gets an error:

2024-05-20 12:12:24,707 ERROR [io.qua.dev.run.jso.JsonRpcCodec] (vert.x-eventloop-thread-2) Error in JsonRPC Call: io.vertx.core.impl.NoStackTraceTimeoutException: The timeout period of 1000ms has been exceeded while executing POST /api/chat for server null

but the UI doesn't receive/show anything and the progress bar keeps going forever. Ideally the jsonrpc service should send an error and the page should discard the previous user message.

iocanel commented 1 month ago

I updated the PR and added error handling in all possible ways I could imagine:

Even though the error is reaching the UI as I can tell from the console, I can't somehow capture it in my javascript code. @phillip-kruger any idea what I am doing wrong here?

iocanel commented 1 month ago

@jmartisk: I checked what you do in qwc-chat.js and it's not applicable to this case. In this case json rpc returns a multi errors are not just catch using catch. They should be caught using onError but the errors are not propagated. @phillip-kruger mentioned that it most probably an issue in json rpc impl.

phillip-kruger commented 1 month ago

I have not looked at this yet, but in the mean time, can you see if anything prints in : 1) The browser console ? 2) The dev-ui log (the json message tab)

jmartisk commented 1 month ago

I'm now realizing, this will also cause problems when Tools are used, not just RAG, no? Because the streaming response from the server will not contain tool-related messages. :( To fix that, we would have to check the chat memory for changes (for the tool messages to appear in it) while the streaming is still going on, and somehow hack them into the stream (for sending the augmented message with RAG, we'd actually have to do something similar).

geoand commented 1 month ago

Yeah, that sounds right

iocanel commented 3 weeks ago

@jmartisk: the issue you had with errors being swallowed has been fixed on the quarkus side of the house by @phillip-kruger and should be available in 3.11.1. Nothing additional needed here.

Now, regarding your comment about tools, the suggested solution is not clear to me. Is it something that is required for this PR or something we could add later on?

jmartisk commented 3 weeks ago

It would be nice to have RAG and tools working properly with streaming, but if you don't want to do it I can have a look. I have some heavier refactoring planned anyway, to be able to support images.