feat: Add the streaming mode to Ollama

philippart-s commented 5 months ago

It would be great if the Ollama model support streaming mode. If I well understand it's not the supported, and after a quick view to the code I'm not able to develop the streaming mode I think 😅.

geoand commented 5 months ago

Hi,

This is definitely doable. We do plan to improve the Ollama support soon, so I'll keep this in mind.

philippart-s commented 5 months ago

I tried to do it inspired by other models but to be honest I'm not sure I understand all the tasks involved 😅.

geoand commented 5 months ago

OpenAiRestApi#streamingCompletion is probably the best place to get insiration. If you do want to take another look, I can certainly give you more pointers

philippart-s commented 5 months ago

Oh I was looking at the OpenAiRecorder class 😉 . Thanks for the information and yes if you have more pointers to try to implement this I would love it.

geoand commented 5 months ago

So the first thing to do is to add something like:

    @Path("/api/generate")
    @RestStreamElementType(MediaType.APPLICATION_JSON)
    @POST
    Multi<CompletionResponse> generate(CompletionRequest request);

to OllamaRestApi.

I am not sure exactly what Ollama responds with when it streams back data, but it's similar to what OpenAI does, you might need an @SseEventFilter as to filter out the final element (which makes the completion of the stream).

With that in place, you need to create OllamaStreamingChatLanguageModel which implements StreamingLanguageModel. It should be fairly easy to map the Multi<CompletionResponse> to StreamingResponseHandler<String>. With this in place you can now use OllamaStreamingChatLanguageModel programmatically.

The next step is to make it into a CDI bean. This has a little work, but it's basically the same exact work as it done for OpenAiStreamingLanguageModel in the OpenAiProcessor and OpenAiRecorder.

philippart-s commented 5 months ago

Thanks a lot, I'll try to do this and will push a draft PR when I'll have some code to show 😅

geoand commented 5 months ago

💪🏼

philippart-s commented 5 months ago

https://github.com/quarkiverse/quarkus-langchain4j/pull/522 is just the beginning of the work but I share as early as possible my code to have earlier feedback and advices 😉

quarkiverse / quarkus-langchain4j

feat: Add the streaming mode to Ollama #501