karthink / gptel

A simple LLM client for Emacs
GNU General Public License v3.0
1.03k stars 111 forks source link

How to pass additional options? Eg., num_ctx for Ollama #330

Closed krvpal closed 4 days ago

krvpal commented 1 week ago

Hi, thank you for this excellent package!

I'm using Ollama and to be able to set the context length, ollama provides this option:

"options": {
    "num_ctx": 4096
  }

I'm struggling to use this in Gptel. I understand there is a curl-args keyword, but any variation like the below I'm trying does not seem to be working. I've set gptel-log-level to debug and checking the logs, but the option does not show up.

  (defvar gptel--ollama
    (gptel-make-ollama
        "ollama"                             
      :host "localhost:12345"             
      :protocol "http"
      :models '("llama3")           
      :stream t
      :curl-args '("num_ctx:4096")
      ))                             

Please help!

karthink commented 1 week ago

gptel uses Ollama's /api/chat endpoint, not /api/generate. I'm not sure if num_ctx is meaningful for Chat-style requests.

In any case, num_ctx is not explicitly supported by gptel. If you want to limit the context size, you can run the relevant /set parameter ... command in Ollama itself, or use the -n option in gptel's transient menu to limit the request to the last n responses:

Screenshot_20240618_145812


Only a subset of options common to most LLM APIs is exposed by gptel right now. I plan to eventually cover backend-specific parameters, such as num_ctx, but this won't be any time soon.

krvpal commented 1 week ago

Thanks for your response.

I'm looking to increase the default context length. Without passing the num_ctx to Ollama, the context length is always set to 2048. This is a problem when I'm using Gptel to summarize text, which is longer and the response I receive only considers the last paragraph.

I would like to increase context length to 4096 or higher, as there are fine tuned llama3 models that are capable of 8x the usual context length.

This is a powerful use case to summarize long texts in org, markdown or any text file if one could set num_ctx. I'm using the -n parameter already as suggested.

I'm not sure how /set parameter ... would work, because I typically start the Ollama server and Gptel handles the rest. I do not use ollama run which is the command that accepts this option. I will explore configuring Ollama server itself in the meantime.

Is using curl-args out of the question?

Thanks!

karthink commented 1 week ago

I'm looking to increase the default context length. Without passing the num_ctx to Ollama, the context length is always set to 2048. This is a problem when I'm using Gptel to summarize text, which is longer and the response I receive only considers the last paragraph.

I would like to increase context length to 4096 or higher, as there are fine tuned llama3 models that are capable of 8x the usual context length.

I understand.

This is a powerful use case to summarize long texts in org, markdown or any text file if one could set num_ctx. I'm using the -n parameter already as suggested.

The -n parameter will help limit the context size further, but not increase the default. So you're right about it being not useful for your use case.

I'm not sure how /set parameter ... would work, because I typically start the Ollama server and Gptel handles the rest. I do not use ollama run which is the command that accepts this option. I will explore configuring Ollama server itself in the meantime.

As stated above, there is currently no way to set this using gptel. Since you're running Ollama somewhere that you control (I'm assuming), you can set this parameter from the Ollama CLI instead. I'm not sure exactly how to do this either, this looks promising. It also seems like you should be able to type /set parameter ... into the CLI when using ollama run.

Is using curl-args out of the question?

Thanks!

:curl-args is for supplying command line arguments to Curl, such as for additional authentication, web proxy etc. It does not affect the contents of the JSON packet to be sent to Ollama.

krvpal commented 1 week ago

Got it! Thanks again.

I do control the Ollama server as it is running locally. I believe I should be able to create my own "modelfile" with the required parameters and use it as a custom model.

ollama run creates a fresh instance every time, I do not think it will help in this case.

Frozenlock commented 1 week ago

@krvpal have you had any success? I'm also wondering how to increase the context length.

karthink commented 1 week ago

Until I can add official support for num_ctx to gptel, I can address this problem by picking a high (but not too high) default value sent with all requests to Ollama. What token count do you suggest? 2048? 4096? Higher?

karthink commented 1 week ago

I've added support for num_ctx via gptel-max-tokens. You can set this using the -c option in gptel's menu, or just set the variable directly. Please update, test and let me know.

karthink commented 1 week ago

I've added support for num_ctx via gptel-max-tokens. You can set this using the -c option in gptel's menu, or just set the variable directly. Please update, test and let me know.

I've had to revert this change since there's no way to control the response token limit without gptel-max-tokens. For now we'll have to settle for adding a high enough default num_ctx parameter to requests.

krvpal commented 4 days ago

@krvpal have you had any success? I'm also wondering how to increase the context length.

Hi @Frozenlock , I used 'modelfile' to configure num_ctx and run that model with a different name in Ollama. Here's a nice article you may find useful, and the official reference.

I've added support for num_ctx via gptel-max-tokens. You can set this using the -c option in gptel's menu, or just set the variable directly. Please update, test and let me know.

I've had to revert this change since there's no way to control the response token limit without gptel-max-tokens. For now we'll have to settle for adding a high enough default num_ctx parameter to requests.

Hi @karthink, current llama3 model supports up to 8192 tokens. I'd recommend to set this max value itself as the default for Ollama, and if for any reason one needs to reduce the context tokens, they can always use the -n option in gptel's menu.

karthink commented 4 days ago

Hi @karthink, current llama3 model supports up to 8192 tokens. I'd recommend to set this max value itself as the default for Ollama, and if for any reason one needs to reduce the context tokens, they can always use the -n option in gptel's menu.

Done, thank you for the suggestion.

I will close this issue now. I have a TODO item to make this customizable in the future, after gptel gets the ability to handle per-backend and per-model capabilities.