gptel-ollama: support customizing num_ctx

wangwb98 commented 1 month ago

Ollama num_ctx can be set in API request when modelfile default value is smaller than the model's max capability.

This patch removed the forced 8192 setting, and added customize-variable "gptel-ollama-num-ctx".

Main reason to name it as ollama specific variable instead of global gptel variable is it's only useful for ollama. Other backends should keep using gptel-max-tokens which includes token count for both request and response.

Not adding a transient item in gptel-menu, also because it's ollama specific. Users should either customize it, or set it in elisp before calling gptel-request etc functions.

wangwb98 commented 1 month ago

Hi @karthink , this pull request added a customize variable for ollama num_ctx, and removed the 8192 setting which prevents user to use models already has num_ctx > 8192. Background discussion see #330 Feel free to edit it or let me know what to change. THanks.

karthink commented 1 month ago

@wangwb98 Thanks for the PR! I'll look at it when I next have time to work on gptel.

karthink commented 2 weeks ago

@wangwb98 Since this is a backend-specific request parameter, i'd prefer to specify it along with the backend instead of as a top-level defvar. Do you think adding a :numctx keyword option to gptel-make-ollama makes sense? This way you can define a different Ollama backend with a different value of :numctx and switch on the fly, or use them simultaneously in different buffers.

Another question I had is whether this is something you want to be able to specify even more granularly, per model instead of per backend. Then you can set a different num_ctx for each model used in an Ollama backend.

I'm not using Ollama so I don't know how you use num_ctx.

karthink commented 1 week ago

@wangwb98 please see this comment in #330. I've added support for setting any Ollama request parameter per-backend or per-model.

See also the discussion in #471: defining variables like gptel-ollama-num-ctx does not scale, as there are hundreds of parameters (across all backends) that you might want to set.

If this approach is satisfactory we can close this PR.

wangwb98 commented 1 week ago

Totally agree with you, thanks!

wangwb98 commented 1 week ago

@wangwb98 Since this is a backend-specific request parameter, i'd prefer to specify it along with the backend instead of as a top-level defvar. Do you think adding a :numctx keyword option to gptel-make-ollama makes sense? This way you can define a different Ollama backend with a different value of :numctx and switch on the fly, or use them simultaneously in different buffers.

Another question I had is whether this is something you want to be able to specify even more granularly, per model instead of per backend. Then you can set a different num_ctx for each model used in an Ollama backend.

I'm not using Ollama so I don't know how you use num_ctx.

My recent preferred usage is to set this in each http request, thus I can roughly calculate out the number of characters in each request, and set in the http request. Actually your new solution (set either in backend level or model level) is already good enough, I personally plan to use your per-model setting in the future. Thanks for the follow up!

karthink / gptel

gptel-ollama: support customizing num_ctx #415