Closed krvpal closed 4 days ago
gptel uses Ollama's /api/chat
endpoint, not /api/generate
. I'm not sure if num_ctx
is meaningful for Chat-style requests.
In any case, num_ctx
is not explicitly supported by gptel. If you want to limit the context size, you can run the relevant /set parameter ...
command in Ollama itself, or use the -n
option in gptel's transient menu to limit the request to the last n
responses:
Only a subset of options common to most LLM APIs is exposed by gptel right now. I plan to eventually cover backend-specific parameters, such as num_ctx
, but this won't be any time soon.
Thanks for your response.
I'm looking to increase the default context length. Without passing the num_ctx
to Ollama, the context length is always set to 2048. This is a problem when I'm using Gptel to summarize text, which is longer and the response I receive only considers the last paragraph.
I would like to increase context length to 4096 or higher, as there are fine tuned llama3 models that are capable of 8x the usual context length.
This is a powerful use case to summarize long texts in org, markdown or any text file if one could set num_ctx
. I'm using the -n
parameter already as suggested.
I'm not sure how /set parameter ...
would work, because I typically start the Ollama server and Gptel handles the rest. I do not use ollama run
which is the command that accepts this option. I will explore configuring Ollama server itself in the meantime.
Is using curl-args
out of the question?
Thanks!
I'm looking to increase the default context length. Without passing the
num_ctx
to Ollama, the context length is always set to 2048. This is a problem when I'm using Gptel to summarize text, which is longer and the response I receive only considers the last paragraph.I would like to increase context length to 4096 or higher, as there are fine tuned llama3 models that are capable of 8x the usual context length.
I understand.
This is a powerful use case to summarize long texts in org, markdown or any text file if one could set
num_ctx
. I'm using the-n
parameter already as suggested.
The -n
parameter will help limit the context size further, but not increase the default. So you're right about it being not useful for your use case.
I'm not sure how
/set parameter ...
would work, because I typically start the Ollama server and Gptel handles the rest. I do not useollama run
which is the command that accepts this option. I will explore configuring Ollama server itself in the meantime.
As stated above, there is currently no way to set this using gptel. Since you're running Ollama somewhere that you control (I'm assuming), you can set this parameter from the Ollama CLI instead. I'm not sure exactly how to do this either, this looks promising. It also seems like you should be able to type /set parameter ...
into the CLI when using ollama run
.
Is using
curl-args
out of the question?Thanks!
:curl-args
is for supplying command line arguments to Curl, such as for additional authentication, web proxy etc. It does not affect the contents of the JSON packet to be sent to Ollama.
Got it! Thanks again.
I do control the Ollama server as it is running locally. I believe I should be able to create my own "modelfile" with the required parameters and use it as a custom model.
ollama run
creates a fresh instance every time, I do not think it will help in this case.
@krvpal have you had any success? I'm also wondering how to increase the context length.
Until I can add official support for num_ctx to gptel, I can address this problem by picking a high (but not too high) default value sent with all requests to Ollama. What token count do you suggest? 2048? 4096? Higher?
I've added support for num_ctx
via gptel-max-tokens
. You can set this using the -c
option in gptel's menu, or just set the variable directly. Please update, test and let me know.
I've added support for
num_ctx
viagptel-max-tokens
. You can set this using the-c
option in gptel's menu, or just set the variable directly. Please update, test and let me know.
I've had to revert this change since there's no way to control the response token limit without gptel-max-tokens
. For now we'll have to settle for adding a high enough default num_ctx
parameter to requests.
@krvpal have you had any success? I'm also wondering how to increase the context length.
Hi @Frozenlock , I used 'modelfile' to configure num_ctx
and run that model with a different name in Ollama. Here's a nice article you may find useful, and the official reference.
I've added support for
num_ctx
viagptel-max-tokens
. You can set this using the-c
option in gptel's menu, or just set the variable directly. Please update, test and let me know.I've had to revert this change since there's no way to control the response token limit without
gptel-max-tokens
. For now we'll have to settle for adding a high enough defaultnum_ctx
parameter to requests.
Hi @karthink, current llama3 model supports up to 8192 tokens. I'd recommend to set this max value itself as the default for Ollama, and if for any reason one needs to reduce the context tokens, they can always use the -n
option in gptel's menu.
Hi @karthink, current llama3 model supports up to 8192 tokens. I'd recommend to set this max value itself as the default for Ollama, and if for any reason one needs to reduce the context tokens, they can always use the
-n
option in gptel's menu.
Done, thank you for the suggestion.
I will close this issue now. I have a TODO item to make this customizable in the future, after gptel gets the ability to handle per-backend and per-model capabilities.
Hi, thank you for this excellent package!
I'm using Ollama and to be able to set the context length, ollama provides this option:
I'm struggling to use this in Gptel. I understand there is a
curl-args
keyword, but any variation like the below I'm trying does not seem to be working. I've setgptel-log-level
todebug
and checking the logs, but the option does not show up.Please help!