Open ibehnam opened 7 months ago
I'm sorry for not getting back to you sooner. As far as I understand, Ollama does keep models in memory for some time. This time is set to 5 minutes by default but can be overridden through either an environment variable or an API parameter keep_alive
. I guess we can add support for passing a custom keep-alive value through LLM's --option
mechanism. However, I'm not sure if it's worth the effort given that one can already customize keep-alize through OLLAMA_KEEP_ALIVE
variable.
ollama doesn't keep the model in memory by default. If we don't use the extension for a few minutes, the model gets unloaded from memory. Which means the next time we use the extension, the model needs to be loaded into the memory again, which takes several seconds (sometimes a couple minutes).
ollama recently added use_mlock flag for the API requests, so it'd be nice to turn that on by default for this extension.