Closed wangwb98 closed 1 week ago
Hi @karthink , this pull request added a customize variable for ollama num_ctx, and removed the 8192 setting which prevents user to use models already has num_ctx > 8192. Background discussion see #330 Feel free to edit it or let me know what to change. THanks.
@wangwb98 Thanks for the PR! I'll look at it when I next have time to work on gptel.
@wangwb98 Since this is a backend-specific request parameter, i'd prefer to specify it along with the backend instead of as a top-level defvar. Do you think adding a :numctx
keyword option to gptel-make-ollama
makes sense? This way you can define a different Ollama backend with a different value of :numctx
and switch on the fly, or use them simultaneously in different buffers.
Another question I had is whether this is something you want to be able to specify even more granularly, per model instead of per backend. Then you can set a different num_ctx
for each model used in an Ollama backend.
I'm not using Ollama so I don't know how you use num_ctx
.
@wangwb98 please see this comment in #330. I've added support for setting any Ollama request parameter per-backend or per-model.
See also the discussion in #471: defining variables like gptel-ollama-num-ctx
does not scale, as there are hundreds of parameters (across all backends) that you might want to set.
If this approach is satisfactory we can close this PR.
Totally agree with you, thanks!
@wangwb98 Since this is a backend-specific request parameter, i'd prefer to specify it along with the backend instead of as a top-level defvar. Do you think adding a
:numctx
keyword option togptel-make-ollama
makes sense? This way you can define a different Ollama backend with a different value of:numctx
and switch on the fly, or use them simultaneously in different buffers.Another question I had is whether this is something you want to be able to specify even more granularly, per model instead of per backend. Then you can set a different
num_ctx
for each model used in an Ollama backend.I'm not using Ollama so I don't know how you use
num_ctx
.
My recent preferred usage is to set this in each http request, thus I can roughly calculate out the number of characters in each request, and set in the http request. Actually your new solution (set either in backend level or model level) is already good enough, I personally plan to use your per-model setting in the future. Thanks for the follow up!
Ollama num_ctx can be set in API request when modelfile default value is smaller than the model's max capability.
This patch removed the forced 8192 setting, and added customize-variable "gptel-ollama-num-ctx".
Main reason to name it as ollama specific variable instead of global gptel variable is it's only useful for ollama. Other backends should keep using gptel-max-tokens which includes token count for both request and response.
Not adding a transient item in gptel-menu, also because it's ollama specific. Users should either customize it, or set it in elisp before calling gptel-request etc functions.