Closed edeandrea closed 4 weeks ago
Isn't there an API call to detect whether the model is already present?
We already do that, so I'm interested in how you reproduce what you mention
If I already have the model present
╰─ ollama ls
NAME ID SIZE MODIFIED
nomic-embed-text:latest 0a109f422b47 274 MB 39 minutes ago
mixtral:latest d39eb76ed9c5 26 GB 39 minutes ago
When I run quarkus dev
I see
Ollama model pull: 2024-05-31 14:39:40,154 INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (build-59) Preloading model mixtral
And it sits there for about 15 minutes.
There isn't any code which reaches out to see if the model is already present. It's instructing ollama to re-pull the model.
If I also add -Dquarkus.langchain4j.devservices.preload=false
, it skips that step and immediately starts and my app, which works fine, because the model is already loaded.
It looks like the processor tries to see what local models are available:
Set<ModelName> localModels = client.localModels().stream().map(mi -> ModelName.of(mi.name()))
.collect(Collectors.toSet());
I'm not sure what this returns. All I know is that this block of code in the processor
if ((ollamaChatModels.size() == 1) && (config.devservices().preload())) {
String modelName = ollamaChatModels.get(0).getModelName();
LOGGER.infof("Preloading model %s", modelName);
client.preloadChatModel(modelName);
}
is triggering ollama to re-pull the model, which on my machine takes 15 minutes.
That sounds like an Ollama bug TBH, but I'll try it on Monday
I tried this and preloading a model works exactly as expected, I could not reproduce the behavior you are seeing.
Closing as I cannot reproduce.
Feel free to add more information and I can have another look.
Sorry I've been at a f2f this week. I'll be back in the office tomorrow.
When preloading an Ollama model, it should first check to see whether or not the model already exists.
For example, if I'm using the mixtral model, it takes almost 10 minutes to download/install. Isn't there an API call to detect whether the model is already present?
Maybe additional logic in
https://github.com/quarkiverse/quarkus-langchain4j/blob/ccb3ce251d794cb0d781d6aed9adaec38db26c6e/core/deployment/src/main/java/io/quarkiverse/langchain4j/deployment/devservice/JdkOllamaClient.java#L127-L152
Also, why is Ollama-specific stuff inside the
core
deployment module? Shouldn't it belong in the ollama extension?