Closed sberyozkin closed 10 months ago
We do implement the streaming API, however it's not used in AI services. Getting it there would be great, but likely pretty hard to do
We do implement the streaming API
Do you mean on the RestEasy Reactive side ? Sure, what I meant, is, https://platform.openai.com/docs/api-reference/streaming, but I definitely don't mean to suggest it can be done easily :-), I totally agree. OpenAI pushes them via SSE, see also https://community.openai.com/t/what-is-this-new-streaming-parameter/391558/9
So, just a wild theory, https://github.com/quarkiverse/quarkus-langchain4j/blob/main/samples/chatbot/src/main/java/io/quarkiverse/langchain4j/sample/chatbot/ChatBotWebSocket.java#L40 can be done by connecting an SSE response from the Chat Bot to RestEasy Reactive SSE output (as opposed to a WebSocket, possibly in some other demo).
Apparently (all or some of) OpenAI API have a boolean streaming option now.
But in any case, agree it may not be straightforward
I meant that we implement StreamingChatModel
from Langchain4j
The hard part of making it work with AI services
I am actually going to close this in favor of https://github.com/quarkiverse/quarkus-langchain4j/issues/105 which is more targeted
@geoand Yeah, that is better, thanks
Hopefully it can be considered worth investigating. I've read around, quite a few related discussions, and one of the main techniques to get faster OpenAI response time is apparently supporting a streaming API. For example, with a ChatBot sample, users would see a response being formed gradually, word by word, or sentence by sentence, minimising the effect of a somewhat slow response. Thanks