quarkiverse / quarkus-langchain4j

Quarkus Langchain4j extension
https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html
Apache License 2.0
116 stars 65 forks source link

Preloading ollama model should only happen if the model doesn't already exist #647

Closed edeandrea closed 4 weeks ago

edeandrea commented 1 month ago

When preloading an Ollama model, it should first check to see whether or not the model already exists.

For example, if I'm using the mixtral model, it takes almost 10 minutes to download/install. Isn't there an API call to detect whether the model is already present?

Maybe additional logic in

https://github.com/quarkiverse/quarkus-langchain4j/blob/ccb3ce251d794cb0d781d6aed9adaec38db26c6e/core/deployment/src/main/java/io/quarkiverse/langchain4j/deployment/devservice/JdkOllamaClient.java#L127-L152

Also, why is Ollama-specific stuff inside the core deployment module? Shouldn't it belong in the ollama extension?

geoand commented 1 month ago

Isn't there an API call to detect whether the model is already present?

We already do that, so I'm interested in how you reproduce what you mention

edeandrea commented 1 month ago

If I already have the model present

╰─ ollama ls
NAME                    ID              SIZE    MODIFIED       
nomic-embed-text:latest 0a109f422b47    274 MB  39 minutes ago  
mixtral:latest          d39eb76ed9c5    26 GB   39 minutes ago  

When I run quarkus dev I see

Ollama model pull: 2024-05-31 14:39:40,154 INFO  [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (build-59) Preloading model mixtral

And it sits there for about 15 minutes.

There isn't any code which reaches out to see if the model is already present. It's instructing ollama to re-pull the model.

https://github.com/quarkiverse/quarkus-langchain4j/blob/ccb3ce251d794cb0d781d6aed9adaec38db26c6e/core/deployment/src/main/java/io/quarkiverse/langchain4j/deployment/devservice/DevServicesOllamaProcessor.java#L144-L148

If I also add -Dquarkus.langchain4j.devservices.preload=false, it skips that step and immediately starts and my app, which works fine, because the model is already loaded.

It looks like the processor tries to see what local models are available:

Set<ModelName> localModels = client.localModels().stream().map(mi -> ModelName.of(mi.name()))
                    .collect(Collectors.toSet());

I'm not sure what this returns. All I know is that this block of code in the processor

            if ((ollamaChatModels.size() == 1) && (config.devservices().preload())) {
                String modelName = ollamaChatModels.get(0).getModelName();
                LOGGER.infof("Preloading model %s", modelName);
                client.preloadChatModel(modelName);
            }

is triggering ollama to re-pull the model, which on my machine takes 15 minutes.

geoand commented 1 month ago

That sounds like an Ollama bug TBH, but I'll try it on Monday

geoand commented 1 month ago

I tried this and preloading a model works exactly as expected, I could not reproduce the behavior you are seeing.

geoand commented 4 weeks ago

Closing as I cannot reproduce.

Feel free to add more information and I can have another look.

edeandrea commented 3 weeks ago

Sorry I've been at a f2f this week. I'll be back in the office tomorrow.