Closed philippart-s closed 5 months ago
Hi,
This is definitely doable. We do plan to improve the Ollama support soon, so I'll keep this in mind.
I tried to do it inspired by other models but to be honest I'm not sure I understand all the tasks involved 😅.
OpenAiRestApi#streamingCompletion
is probably the best place to get insiration. If you do want to take another look, I can certainly give you more pointers
Oh I was looking at the OpenAiRecorder
class 😉 .
Thanks for the information and yes if you have more pointers to try to implement this I would love it.
So the first thing to do is to add something like:
@Path("/api/generate")
@RestStreamElementType(MediaType.APPLICATION_JSON)
@POST
Multi<CompletionResponse> generate(CompletionRequest request);
to OllamaRestApi
.
I am not sure exactly what Ollama responds with when it streams back data, but it's similar to what OpenAI does, you might need an @SseEventFilter
as to filter out the final element (which makes the completion of the stream).
With that in place, you need to create OllamaStreamingChatLanguageModel
which implements StreamingLanguageModel
. It should be fairly easy to map the Multi<CompletionResponse>
to StreamingResponseHandler<String>
.
With this in place you can now use OllamaStreamingChatLanguageModel
programmatically.
The next step is to make it into a CDI bean. This has a little work, but it's basically the same exact work as it done for OpenAiStreamingLanguageModel
in the OpenAiProcessor
and OpenAiRecorder
.
Thanks a lot, I'll try to do this and will push a draft PR when I'll have some code to show 😅
💪🏼
https://github.com/quarkiverse/quarkus-langchain4j/pull/522 is just the beginning of the work but I share as early as possible my code to have earlier feedback and advices 😉
It would be great if the Ollama model support streaming mode. If I well understand it's not the supported, and after a quick view to the code I'm not able to develop the streaming mode I think 😅.