taketwo / llm-ollama

LLM plugin providing access to local Ollama models using HTTP API
Apache License 2.0
143 stars 8 forks source link

Implementing `keep_alive` parameter #4

Open ibehnam opened 7 months ago

ibehnam commented 7 months ago

ollama doesn't keep the model in memory by default. If we don't use the extension for a few minutes, the model gets unloaded from memory. Which means the next time we use the extension, the model needs to be loaded into the memory again, which takes several seconds (sometimes a couple minutes).

ollama recently added use_mlock flag for the API requests, so it'd be nice to turn that on by default for this extension.

taketwo commented 6 months ago

I'm sorry for not getting back to you sooner. As far as I understand, Ollama does keep models in memory for some time. This time is set to 5 minutes by default but can be overridden through either an environment variable or an API parameter keep_alive. I guess we can add support for passing a custom keep-alive value through LLM's --option mechanism. However, I'm not sure if it's worth the effort given that one can already customize keep-alize through OLLAMA_KEEP_ALIVE variable.