Closed jasalt closed 1 day ago
There are hundreds of options (across backends), so exposing them all as gptel configuration variables does not scale. Thankfully what you're looking for can be done easily when defining the Ollama backend:
(gptel-make-ollama "Ollama"
:host "localhost:11434"
:models '(openhermes:latest)
:stream t
:request-params '(:keep_alive "60m"))
You can add any other request parameters you need to :request-params
this way. If you want different keep_alive
settings for different models you can specify it per model instead. See #330 and #471 for more details and examples.
Right, it did not come to my mind to look for a generic :request-params config option, thank's. Seems like a good solution.
The time Ollama model keeps loaded in memory (in seconds) after request could be set as a parameter to the Ollama Chat API: https://github.com/ollama/ollama/pull/2146#issuecomment-2282889389. Maybe it would be a useful addition in gptel settings, if it makes sense. It defaults to 5 minutes which gets annoying with bigger models.
Alternatively it can be set with env var
OLLAMA_KEEP_ALIVE=<SECONDS>
before runningollama serve
.