karthink / gptel

A simple LLM client for Emacs
GNU General Public License v3.0
1.04k stars 111 forks source link

Separate system prompts and directives #249

Open jwr opened 3 months ago

jwr commented 3 months ago

This is a feature request based on several days of using gptel with various models. I am not trying to implement this and submit a pull request, because I think this needs designing first, and it's likely my design choices would be different.

Current problems:

My suggestion:

Make the system prompt(s) separate from task/directive prompts, so that I can pick a system prompt and then easily work with several directives in the context of that system prompt.

Context:

I usually provide a large system prompt with a lot of context, containing for example a full description of my application, along with the requested writing style. The directives are fairly simple: expand this, rewrite that, summarize this, etc. and they change very often, almost every invocation requires editing them. This is difficult if directives are combined with the context.

When working with code, I also have a general set of rules which do not change often, and then the exact directive is something I edit on the spot.

In both cases, having a separate set of system prompts and directives would make working through gptel more convenient.

karthink commented 3 months ago
  • there is no separation between the task/directive prompt (e.g. "Rewrite this paragraph") and the system prompt (providing context)

  • there is no way to pick from one of system prompts (contexts) and separately pick one of several tasks to perform

Yes, this has been bothering me for months now, it was just low on the priority list.

  • large prompts are not easy to handle (for example, they expand the minibuffer after modifying them)

Fixed. Are there any other issues with handling large prompts?

Context: ...

I agree, the system prompt is for shaping the conversation as a whole, and we need an easy way to specify per-message instructions. So far I've just been adding this before/after the buffer contents of interest. For instance, when rewriting/refactoring I just add a comment above the region of interest explaining what I want it to do.

This was my original plan, to just use the buffer. But it doesn't always work, like when using read-only buffers, and even when it's possible I often have to manually erase my instructions from the buffer afterwards. So it might be worth adding a directives option. There are many UI considerations, however. Here are some questions, let me know what you think.

  1. Early in the project, before the term "system prompt" had stabilized, I called them directives in gptel. Now I'll probably have to replace "gptel-directives" with "gptel-system-prompts", since by directives we mean short-term task specifications. This whole thing is going to confuse users. In fact it's confusing me since I'm used to thinking of "directive" as the system prompt. What is a clear way to communicate these ideas, and the difference between the two, to new users? In the following questions, I'm going to call them the "system prompt" and "additional directives".

  2. System messages are currently set once per conversation/buffer. Should the additional directives be set for the next query only or should it be persistent?

  3. Should the directive be read from the minibuffer (like the refactoring message is right now) or from a dedicated composition buffer like the system prompt is?

  4. Internally, should the additional directives be appended to the system prompt, or prefixed to the latest user message?

  5. Following #176 and related discussions, I'm planning to add the ability to include additional context (like other buffers, regions or attach files) to the system prompt. How will this affect how the additional directives are included, both internally (in the structure of the full JSON query) and in the UI, which will get very cluttered?

Here is a view of the full transient UI: 2024-03-13-173945_953x756_scrot

Everything except the Conversation Context menu and the Additional directives feature (subject of this issue) is already implemented in gptel.

This looks pretty busy to me. Any suggestions?

karthink commented 3 months ago

@jwr I added the option for a separate directive (for the next response only) in 53dd3c5f. Please update and let me know how it works for you.

To use it, you'll have to run (setq gptel-expert-commands t). I didn't want to overwhelm new users with a giant menu.

Here's how it looks:

2024-03-14-225631_942x872_scrot

To add an additional directive to the next command, you can run d (below the system messages option). The directive you type in will be shown in the buffer when appropriate.

This directive is, again, for the next response only. If you want to reuse it for the session you can either add it to your full system message, or press C-x C-s in the transient menu to persist it.

jwr commented 3 months ago

I would normally not presume to comment on general UI design, but since you asked, I will give you my thoughts based on my experience using gptel (quite heavily). Please treat this as input from an advanced Emacs user (as in, 30 years or so).

First, the new system message functionality works very well and is very close to what I was looking for! It already makes working with gptel easier, thank you!

Now, more generally about the UI. If this was my project, I would redesign the UI significantly at this point. I think there are many changes that would improve the usability, and I think this is actually a good time to do this! I wouldn't worry about existing users too much, given how relatively difficult gptel is to use, I'd say most of the users at this point are expert users anyway, and will accept the changes.

  1. I found the default behavior of gptel-send to be unintuitive. I use this kind of binding to invoke gptel: :bind* (([(super G)] . (lambda (&optional arg) (interactive "P") (gptel-send 1)))), because the most common usage for me is to work on something selected in a buffer (mostly text, sometimes code), and I never want to just send something to an LLM right away. I always want to modify the options, specifically the directive. So gptel-send with prefix arg is always the way to go.

  2. I also use :bind (:map gptel-mode-map ("S-<return>" . gptel-send)). I know it's a personal preference, but it's in line with my years of experience with apps like Mathematica.

  3. I found the per-buffer behavior both confusing and getting in the way. When I work on something, it will usually be a single task (like writing/rewriting text), in multiple buffers. I have to remember to always make sure to have the right directive (or system prompt and directive) in every buffer. I already made the mistake of sending LLM requests with incorrect directives, because I forgot to set them again. This goes for both the system prompt and directives, and I would never want the directive to be for the next request only. I would like both system prompt and directive to be persistent until I change them.

  4. I wasn't sure where the colon before the directive came from, and what it signifies.

  5. I think the words "system prompt" and "directive" are understandable and precise.

  6. I would change the keys: "s" for system prompt, "d" for directive.

  7. I would also add "q" everywhere, which would quit. I know there is C-g, but most modern Emacs packages spoiled me by providing an easy-to-access 'q' which always just quits and closes any popup buffers.

  8. I believe the "rewrite/refactor" functionality would also benefit from all of the above (separate system prompt and directive, persistent system prompt and directive (although perhaps different prompts for rewrites?), "s" and "d" for system prompt and directive, "q" key quits). EDIT: I noticed in another issue you mentioned that you are thinking about removing the separate rewrite/refactor altogether. I think that's a good idea, although some of the output options there are nice (rewrite in place, rewrite and ediff) and it would be great if these could be folded into the main output options.

  9. Looking at the gptel menu, most of the things on the left are options I never use. I change the model very often, but it's on a "-m" binding, not a direct one. I don't use the middle column. I use the "Response to" options often. So this isn't optimized, at least not for my use case.

  10. I do not understand the "-n" option when I'm using gptel in a buffer with text. I like the "Conversation context" grouping, but if I understand correctly it should only appear when choosing options associated with a gptel conversation buffer?

  11. Finally, and this is really out of scope in this Github issue, I found the "randomness" setting confusing. I wasn't sure how a scale of 0.0 to 2.0 relates to the specific LLM I'm using.

Now, to answer your questions directly:

  1. I think for new users "system prompt" and "directive" are clear terms. Those are the terms that LLMs use.
  2. I expected everything to be global and persistent and found it very surprising (and error-prone) that it isn't.
  3. I would start with the minibuffer and perhaps provide a configuration option for that. Some people might want to use longer directives, and the composition buffer is a convenient tool for that. I would optionally (config setting) remove the comments from the composition buffer, though.
  4. That is a good question and I am not sure if there is a correct answer. At various times, various LLMs treated the system prompt differently. OpenAI used to tell people that the user messages are more important than the system prompt, but I think that changed over time. Anthropic tells users to place all of the context in the system prompt and doesn't explicitly tell users where to place the directive itself (just experiment!). Here again I am afraid a config setting might be needed.
  5. I don't think I would use that functionality, as "C-x i" is there and that's the first thing I would think of. But overall I wouldn't worry too much about cluttering the UI. I think the most important thing is to let users do things that they need to do quickly, efficiently and without needless repetition.

These are all just my thoughts and impressions and please treat them as such — I wanted to show you how I use gptel and what I expected and found confusing. This might not be the same for everyone!

karthink commented 3 months ago

Thanks for the detailed feedback!

I would normally not presume to comment on general UI design, but since you asked, I will give you my thoughts based on my experience using gptel (quite heavily). Please treat this as input from an advanced Emacs user (as in, 30 years or so).

It's very helpful to me to understand how you use this package, so this is welcome. (Coming up on year 20 here!)

First, the new system message functionality works very well and is very close to what I was looking for! It already makes working with gptel easier, thank you!

That's great. It's not done yet, but I'd like to use it a bunch and get more feedback to see where I should take it.


Keybindings and command behavior

  1. I found the default behavior of gptel-send to be unintuitive. I use this kind of binding to invoke gptel: :bind* (([(super G)] . (lambda (&optional arg) (interactive "P") (gptel-send 1)))), because the most common usage for me is to work on something selected in a buffer (mostly text, sometimes code), and I never want to just send something to an LLM right away. I always want to modify the options, specifically the directive. So gptel-send with prefix arg is always the way to go.

The default gptel-send behavior is intended for chatting with an LLM, in a regular or chat buffer as generated by the gptel command. In this context it makes sense that gptel-send is like hitting the send button in a messaging application. If you had to bring up the menu every time you wanted to send a message, it would get tiring very quickly. For your usage I suggest binding gptel-menu to super G instead. You can ignore the gptel-send command.

  1. I also use :bind (:map gptel-mode-map ("S-<return>" . gptel-send)). I know it's a personal preference, but it's in line with my years of experience with apps like Mathematica.

First, note that if you are not in gptel-mode in a chat buffer, gptel-send is not bound to anything.

I only bind it in gptel-mode-map, which is active in dedicated chat buffers. In this context: Org-mode binds S-<return> to something. C-c <return> was free in both Org and Markdown, so I went with that.


Buffer-local values

  1. I found the per-buffer behavior both confusing and getting in the way. When I work on something, it will usually be a single task (like writing/rewriting text), in multiple buffers. I have to remember to always make sure to have the right directive (or system prompt and directive) in every buffer. I already made the mistake of sending LLM requests with incorrect directives, because I forgot to set them again. This goes for both the system prompt and directives, and I would never want the directive to be for the next request only. I would like both system prompt and directive to be persistent until I change them.

Addressing this in detail below in the next comment.


Nomenclature

  1. I think the words "system prompt" and "directive" are understandable and precise.

Actually, in the OpenAI API documentation they're calling it "system message" instead of "system prompt". Either they switched at some point or I misremembered "system prompt". Anthropic calls it the "system prompt". The Gemini API has no concept of a system prompt. So there's no standardization yet.

I can go with "System Message" for now, and switch to "System Prompt" depending on the situation in a few months.

I take your point about "directive", thanks, I'll stick with that.


Menu issues

  1. I wasn't sure where the colon before the directive came from, and what it signifies.

This is a limitation of the Transient interface, you need a unique prefix to distinguish options from each other. I think I can remove it with some elbow grease. I'll take a look eventually, but I think it's harmless for now.

  1. I would change the keys: "s" for system prompt, "d" for directive.

Done. "h" was a placeholder from the early days of the project anyway.

  1. I would also add "q" everywhere, which would quit. I know there is C-g, but most modern Emacs packages spoiled me by providing an easy-to-access 'q' which always just quits and closes any popup buffers.

This is a global Transient setting, see (info "(transient) Aborting and Resuming Transients"), and the function transient-bind-q-to-quit. Presumably you want it to be set everywhere, including in magit (if you use it) and other Transient menus?

I don't think adding it to gptel's menus specifically is the right way to go about it.

  1. I believe the "rewrite/refactor" functionality would also benefit from all of the above (separate system prompt and directive, persistent system prompt and directive (although perhaps different prompts for rewrites?), "s" and "d" for system prompt and directive, "q" key quits). EDIT: I noticed in another issue you mentioned that you are thinking about removing the separate rewrite/refactor altogether. I think that's a good idea, although some of the output options there are nice (rewrite in place, rewrite and ediff) and it would be great if these could be folded into the main output options.

Rewrite-in-place and rewrite-and-ediff are already available in the main output options. Rewriting in place is the "Replace/Delete prompt" option in the middle column. I realize now this may not obvious, so I changed it to Respond in place: 2024-03-15-143922_905x682_scrot

To Ediff, you can respond in-place and then bring up gptel-menu with the cursor in the response region. If you regenerate a response many times, you can cycle through past responses and Ediff against any of them.

Access previous variant or Ediff: rewrite-and-ediff

Ediff in progress: 2024-03-15-144557_1822x899_scrot

  1. Looking at the gptel menu, most of the things on the left are options I never use. I change the model very often, but it's on a "-m" binding, not a direct one. I don't use the middle column. I use the "Response to" options often. So this isn't optimized, at least not for my use case.

I'm not sure what you mean by "not optimized" here. All keys are visible and directly accessible, so isn't placements of the columns irrelevant? (Note: the placement of columns as opposed to grouping options into columns, which I understand is useful.)

Re: the -m binding, I don't understand what you mean by "not on a direct binding". Is it that you have to press two keys (- and m), or that you have to select the backend provider before you can choose the model? (You may not have seen this latter issue if you only use ChatGPT.)

  1. I do not understand the "-n" option when I'm using gptel in a buffer with text.

It is only relevant when you have an extended conversation (i.e. more than one back and forth) and want to limit the conversation context sent to the LLM. Note: I cannot selectively disable this option since you can have an extended conversation in any buffer (I often do), not just one with gptel-mode enabled.

I like the "Conversation context" grouping, but if I understand correctly it should only appear when choosing options associated with a gptel conversation buffer?

Not necessarily. You can be in a code buffer (say), and want to provide additional context to the LLM to help it rewrite some of your code.

  1. Finally, and this is really out of scope in this Github issue, I found the "randomness" setting confusing. I wasn't sure how a scale of 0.0 to 2.0 relates to the specific LLM I'm using.

It's the LLM temperature setting, which governs the fuzz around its choice of the next token generated from several alternatives. Basically, any ensemble of responses to the same prompt will be closer to each other at low temperatures and further apart at high ones. Following your feedback, I renamed "randomness" to "temperature" and hid it behind the gptel-expert-commands flag, so it won't show up by default. It really isn't relevant to most users.

But overall I wouldn't worry too much about cluttering the UI. I think the most important thing is to let users do things that they need to do quickly, efficiently and without needless repetition.

Agreed.


Setting directives

Now, to answer your questions directly:

  1. I would start with the minibuffer and perhaps provide a configuration option for that. Some people might want to use longer directives, and the composition buffer is a convenient tool for that. I would optionally (config setting) remove the comments from the composition buffer, though.

Reading from the minibuffer is fine for now. I want to distinguish between the system message and the "additional directive" by their purpose, more on this in the next comment.

  1. That is a good question and I am not sure if there is a correct answer. At various times, various LLMs treated the system prompt differently. OpenAI used to tell people that the user messages are more important than the system prompt, but I think that changed over time. Anthropic tells users to place all of the context in the system prompt and doesn't explicitly tell users where to place the directive itself (just experiment!). Here again I am afraid a config setting might be needed.

Presently gptel is appending the directive to the system message.

Except for GPT-3.5 and GPT-4, most LLMs don't even respect the system message right now. The Claude models ignore explicit directions half the time, as do the Mistral ones. So this whole area is a bit of a mess.


Setting Context

  1. I don't think I would use that functionality, as "C-x i" is there and that's the first thing I would think of.

C-x i, insert-buffer or insert-buffer-substring will add relevant text to the current buffer, which is fine if you're in a dedicated chat buffer. But if you want to provide additional context for a specific task (like refactoring), you have to open up the system prompt editor and add this text manually. These menu commands will just make it quick/efficient, and also dynamically build the system prompt when there are changes to the context.

I also plan to make it easy to add context to your system prompt from anywhere in Emacs, via a gptel-attach command. This will append to the system prompt

So you can build up a system prompt with the context you need easily. This context will be "live", in that the full system prompt will be built when sending the command, so it will use the up-to-date contents of the region/buffer etc.

I'm going to build this as an experiment anyway, but I don't know yet if I'm going to add it to gptel. Do you see yourself using something like this?


These are all just my thoughts and impressions and please treat them as such — I wanted to show you how I use gptel and what I expected and found confusing. This might not be the same for everyone!

Again, thanks for the detailed feedback. I mostly just get bug reports, and rarely get to learn what people find confusing or useful about the package.

My impression is that you use gptel primarily to modify or act on buffer text (via selection) in specific ways, which is a subset of gptel's uses. So it's fair that you don't find many of the menu options relevant. I use it for this purpose sometimes, but mostly I use it to look things up quickly (using the "prompt from minibuffer" and "response to minibuffer" options), or for extended chats about topics I want to learn about, which is quite different. I don't think I can optimize the UI for every use, so my design lodestone is very simple: "gptel feeds text from buffers to LLMs and inserts the responses below". All other possible behaviors are variations on this theme.

daedsidog commented 3 months ago

@karthink I noticed you mentioned that you plan to add something to add context to the system message. I for one find that the context behaves better when it is given as a user message before the actual code (for refactoring).

As you already know, I already made something that aggregates context from various buffers, and it works very well. I added it to my fork (mostly I just renamed the prefix from contexter to gptel). You can check it out at https://github.com/daedsidog/gptel/blob/contexter/gptel-contexter.el. Note that I haven't finished integrating it, because I'm not sure which branch the menu in the images appears in, as my transient menu is different than what I see in the images.

karthink commented 3 months ago

persistent system prompt and directives

This goes for both the system prompt and directives, and I would never want the directive to be for the next request only. I would like both system prompt and directive to be persistent until I change them.

I'm going to assume temporarily that buffer-locality of the system prompt is fixed, so "persistent" means that it retains its value across invocations of gptel-send (or equivalent).

In this context, what exactly is the difference between the system prompt and the directive? If the directive is persistent (as the sytem prompt already is), then you can just append it to the system prompt, right? My understanding is that the directive contains query-specific instructions, so it doesn't make sense to add it to the system prompt.

I would like to keep the purposes of the system prompt and the additional directive distinct.

If you want the directive to be persistent, you can then do it by setting it in the transient menu by pressing C-x s: transient-set-command

Pressing C-x C-s makes it persistent across Emacs sessions. You can cycle through previous states of the transient menu with C-x n and C-x p.

As an aside: one thing I would like to do is provide a gptel-additional-directives option that you can prepopulate with directives. (Ideally just gptel-directives, but that name is taken by what should be gptel-system-messages.)

Buffer-local configurations

I found the per-buffer behavior both confusing and getting in the way. When I work on something, it will usually be a single task (like writing/rewriting text), in multiple buffers. I have to remember to always make sure to have the right directive (or system prompt and directive) in every buffer. I already made the mistake of sending LLM requests with incorrect directives, because I forgot to set them again.

I would like to address this frustration in gptel, but I have to understand the problem fully first.

My original assumption was that tasks with gptel would be buffer-specific, as many of mine are. I guess this was wrong -- I'd like to hear from more users about this.

Here are the buffer-local user options gptel provides:

The additional directive is currently not stored.

Of these, I'm assuming that

The buffer-local design has its merits: I can be using gptel in a chat buffer next to one with code and switching between them, for which I need buffer-specific system messages. I don't know yet what the best way to resolve this is.

Here's a potential solution (for system messages only, not the model). I can make it so when you set/edit/choose the system message from the transient menu, you have a choice of setting it globally or buffer-locally. It's one extra keypress, but might be worth the trouble.

Let me know what you think.

karthink commented 3 months ago

@daedsidog

@karthink I noticed you mentioned that you plan to add something to add context to the system message. I for one find that the context behaves better when it is given as a user message before the actual code (for refactoring).

Thanks for reporting this. Anthropic recommends otherwise, so is this with the GPT models? The plan I described above works the same either way. In fact there's more of a case for building the query from a list of sources dynamically if it's added as a user message instead of a system prompt, since you can't run insert-file in a code buffer above your code.

As you already know, I already made something that aggregates context from various buffers, and it works very well. I added it to my fork (mostly I just renamed the prefix from contexter to gptel). You can check it out at https://github.com/daedsidog/gptel/blob/contexter/gptel-contexter.el. Note that I haven't finished integrating it, because I'm not sure which branch the menu in the images appears in, as my transient menu is different than what I see in the images.

I'll check it out, thanks! All the images are from the master branch, the transient menu has had several minor tweaks in the past couple of days but I think I'm close to done.

There are also a few new menu options, as discussed in this thread, accessible by setting gptel-expert-commands to t. I will make the new "add directive" option visible by default soon.

daedsidog commented 3 months ago

@karthink I'll make a pull request so that we won't have this fragmented communication about this, and everything will be in one area.

EDIT: https://github.com/karthink/gptel/pull/256

jwr commented 3 months ago

Thank you for considering my comments! In order to keep this brief, I'll try to respond only to things where a response is necessary. In many cases you either pointed out a better way to do something or stated why you believe things should be done a certain way and I won't discuss those.

Menu issues

  1. I wasn't sure where the colon before the directive came from, and what it signifies.

This is a limitation of the Transient interface, you need a unique prefix to distinguish options from each other. I think I can remove it with some elbow grease. I'll take a look eventually, but I think it's harmless for now.

I wouldn't worry about it too much. It was somewhat confusing, but as long as it doesn't end up in prompts, it doesn't matter.

  1. I would also add "q" everywhere, which would quit. I know there is C-g, but most modern Emacs packages spoiled me by providing an easy-to-access 'q' which always just quits and closes any popup buffers.

This is a global Transient setting, see (info "(transient) Aborting and Resuming Transients"), and the function transient-bind-q-to-quit. Presumably you want it to be set everywhere, including in magit (if you use it) and other Transient menus?

I don't think adding it to gptel's menus specifically is the right way to go about it.

Indeed, probably not — I didn't know about this option. I was used to Magit allowing me to use 'q' to quit anywhere, and I thought this was the default for transients.

To Ediff, you can respond in-place and then bring up gptel-menu with the cursor in the response region. If you regenerate a response many times, you can cycle through past responses and Ediff against any of them.

Ok, I had absolutely no idea that generated text is anything more than text and that gptel can interact with it! It's a great feature, but might be difficult to discover. BTW, "Ediff previous" still has some rough edges (sometimes gives me "Marker does not point anywhere", and when I quit ediff it tries to kill the buffer, but those might be my issues, I'll keep testing).

  1. Looking at the gptel menu, most of the things on the left are options I never use. I change the model very often, but it's on a "-m" binding, not a direct one. I don't use the middle column. I use the "Response to" options often. So this isn't optimized, at least not for my use case.

I'm not sure what you mean by "not optimized" here. All keys are visible and directly accessible, so isn't placements of the columns irrelevant? (Note: the placement of columns as opposed to grouping options into columns, which I understand is useful.)

Re: the -m binding, I don't understand what you mean by "not on a direct binding". Is it that you have to press two keys (- and m), or that you have to select the backend provider before you can choose the model? (You may not have seen this latter issue if you only use ChatGPT.)

I mean that I need to press two keys to perform a common (in my case) operation. I often switch between models.

  1. Finally, and this is really out of scope in this Github issue, I found the "randomness" setting confusing. I wasn't sure how a scale of 0.0 to 2.0 relates to the specific LLM I'm using.

It's the LLM temperature setting, which governs the fuzz around its choice of the next token generated from several alternatives. Basically, any ensemble of responses to the same prompt will be closer to each other at low temperatures and further apart at high ones. Following your feedback, I renamed "randomness" to "temperature" and hid it behind the gptel-expert-commands flag, so it won't show up by default. It really isn't relevant to most users.

What I meant is that for GPT-4 temperature range is from 0.0 to 2.0, while for Claude 3 it's from 0.0 to 1.0. Just looking at the UI I wasn't sure if gptel doesn't rescale the values.

Setting directives

[...]

  1. That is a good question and I am not sure if there is a correct answer. At various times, various LLMs treated the system prompt differently. OpenAI used to tell people that the user messages are more important than the system prompt, but I think that changed over time. Anthropic tells users to place all of the context in the system prompt and doesn't explicitly tell users where to place the directive itself (just experiment!). Here again I am afraid a config setting might be needed.

Presently gptel is appending the directive to the system message.

Except for GPT-3.5 and GPT-4, most LLMs don't even respect the system message right now. The Claude models ignore explicit directions half the time, as do the Mistral ones. So this whole area is a bit of a mess.

Yes, it is, and I don't think there is a good general way to address this. One could invent new naming ("context prompt"), but I think it's better to just go along and provide flexibility.

@daedsidog is right that context might behave better if provided as a user prompt. I experimented (both with GPT-4 and Anthropic Claude) with using a system prompt for context and with providing context in a separate user-agent interaction. This is somewhat more difficult with Anthropic, because they don't allow multiple user prompts without agent responses, so I invented a response like "Understood." and went with that. Eventually I decided that the system prompt works well, so I use that both for GPT-4 and Anthropic Claude, but this might change in the future.

Setting Context

[...] I also plan to make it easy to add context to your system prompt from anywhere in Emacs, via a gptel-attach command. This will append to the system prompt

  • the file at point or marked files when called from dired
  • the buffer at point or maked buffers when called from buffer-menu or ibuffer
  • the active region when called with a region active, while tracking changes to the region text

So you can build up a system prompt with the context you need easily. This context will be "live", in that the full system prompt will be built when sending the command, so it will use the up-to-date contents of the region/buffer etc.

I'm going to build this as an experiment anyway, but I don't know yet if I'm going to add it to gptel. Do you see yourself using something like this?

I don't think so, I can't think of a use for this right now. I will respond separately to your questions about system/directive separation and I'll try to describe why this is important for me, perhaps this will help explain my use case better.

[...]

My impression is that you use gptel primarily to modify or act on buffer text (via selection) in specific ways, which is a subset of gptel's uses. So it's fair that you don't find many of the menu options relevant. I use it for this purpose sometimes, but mostly I use it to look things up quickly (using the "prompt from minibuffer" and "response to minibuffer" options), or for extended chats about topics I want to learn about, which is quite different. I don't think I can optimize the UI for every use, so my design lodestone is very simple: "gptel feeds text from buffers to LLMs and inserts the responses below". All other possible behaviors are variations on this theme.

That makes perfect sense.

I do use gptel in other ways, too — I use chat buffers sometimes, or work on whole buffers instead of on selections. But it's true I never use it with the minibuffer, I never have anything as short.

jwr commented 3 months ago

persistent system prompt and directives

This goes for both the system prompt and directives, and I would never want the directive to be for the next request only. I would like both system prompt and directive to be persistent until I change them.

I'm going to assume temporarily that buffer-locality of the system prompt is fixed, so "persistent" means that it retains its value across invocations of gptel-send (or equivalent).

In this context, what exactly is the difference between the system prompt and the directive? If the directive is persistent (as the sytem prompt already is), then you can just append it to the system prompt, right? My understanding is that the directive contains query-specific instructions, so it doesn't make sense to add it to the system prompt.

I would like to keep the purposes of the system prompt and the additional directive distinct.

I think examples are in order.

  1. Tasks like explaining or refactoring code: the system prompt largely doesn't matter.

  2. Writing and editing text: here my system prompt contains a general description of the writing style that I want the LLM to use (a "persona" description). It's between one and three paragraphs long. I currently use two such prompts, describing two styles of writing, and switch between them.

  3. Writing or translating text related to my SaaS: here the system prompt gets really large, because it contains a full description of the app, and the writing style that I expect. It's similar to how you would brief a translator about what they will be translating. We're talking about 35 paragraphs, around 1200 words or so.

In (2) and (3) the system prompt provides the context, which rarely changes, and doesn't need editing every time I perform a task.

In contrast to that, the directive is usually fairly simple, but changes much more often (but not every time!).

The distinction is that the system prompt provides a general context, while the directive provides specific instructions on what to do with the supplied data.

Now, going back to persistence, here's what I would do quite often:

  1. Select a system prompt (working on my SaaS today).

  2. Write a directive (rewrite this paragraph fixing grammar mistakes, improve clarity, etc).

  3. Apply that directive to a paragraph in a buffer.

  4. Switch to a different buffer and do the same kind of rewrites there.

  5. Repeat (3) and (4) multiple times.

  6. Change the directive when switching to a different task (writing outlines, summarizing, etc).

  7. Perform that task in multiple buffers.

As you can see, at least in my workflow there is a significant distinction between the "context" and the directive. The context changes rarely, the directive more often, but also not at every invocation.

[...]

As an aside: one thing I would like to do is provide a gptel-additional-directives option that you can prepopulate with directives. (Ideally just gptel-directives, but that name is taken by what should be gptel-system-messages.)

That would be quite useful for me.

Buffer-local configurations

I found the per-buffer behavior both confusing and getting in the way. When I work on something, it will usually be a single task (like writing/rewriting text), in multiple buffers. I have to remember to always make sure to have the right directive (or system prompt and directive) in every buffer. I already made the mistake of sending LLM requests with incorrect directives, because I forgot to set them again.

I would like to address this frustration in gptel, but I have to understand the problem fully first.

My original assumption was that tasks with gptel would be buffer-specific, as many of mine are. I guess this was wrong -- I'd like to hear from more users about this.

That assumption is not true for my workflow. I work with tens of text buffers (at least). Think about a website with tens/hundreds of pages, every page is in a separate .md file, and I'll be performing the same tasks with the same parameters across multiple buffers.

Here are the buffer-local user options gptel provides:

  • gptel-backend: the LLM provider
  • gptel-model
  • gptel--system-message
  • gptel-temperature, gptel-max-tokens and gptel--num-messages-to-send

The additional directive is currently not stored.

Of these, I'm assuming that

  • you can set the default backend once in your configuration and almost never need to change it.

Not quite true, see below.

  • you change the model semi-frequently, and would like it to be set globally across Emacs.

True. But this also implies the backend will change. I currently use "gpt-4-turbo-preview" "gpt-4-0613" through OpenAI, "claude-3-opus-20240229" and "claude-3-sonnet-20240229" through Anthropic, and "deepseek-coder:33b-instruct-q8_0" through Ollama (occasionally trying other local ollama models).

  • you change the system message frequently, and would like it to be set globally across Emacs.

I don't change the system messages frequently, but I do sometimes switch between those that I've defined.

  • the rest are irrelevant to this discussion.

The buffer-local design has its merits: I can be using gptel in a chat buffer next to one with code and switching between them, for which I need buffer-specific system messages. I don't know yet what the best way to resolve this is.

Here's a potential solution (for system messages only, not the model). I can make it so when you set/edit/choose the system message from the transient menu, you have a choice of setting it globally or buffer-locally. It's one extra keypress, but might be worth the trouble.

Let me know what you think.

Hmm. I don't find that very appealing. The beauty and promise of gptel is that it might let people perform complex tasks with minimal interaction (as measured by the number of keypresses required to perform a common task).

Perhaps something along the lines of a config variable, gptel-persist-choices, settable to one of nil (no persistence at all), :buffer, :session? I think people would generally want one kind of persistence. I am certainly in the :session camp, if not even in the :saved-to-file camp :-)

daedsidog commented 3 months ago

In my current setup, I bake the context in the user prompt that is being sent to the model anyway, so there shouldn't be problems even with models that don't allow multiple prompts if done this way.

karthink commented 3 months ago

@jwr

BTW, "Ediff previous" still has some rough edges (sometimes gives me "Marker does not point anywhere", and when I quit ediff it tries to kill the buffer, but those might be my issues, I'll keep testing).

Noted, consider it an experimental feature for now. I also have to make it discoverable.

I mean that I need to press two keys to perform a common (in my case) operation. I often switch between models.

You can replace the key for choosing a model.

(transient-suffix-put 'gptel-menu (kbd "-m") :key "l")

I might change the defaults eventually.

I also flattened the model selection process now so you don't have to pick the provider first, it's all together. So it should be faster.

What I meant is that for GPT-4 temperature range is from 0.0 to 2.0, while for Claude 3 it's from 0.0 to 1.0. Just looking at the UI I wasn't sure if gptel doesn't rescale the values.

It does not, this needs to be fixed.

That assumption is not true for my workflow. I work with tens of text buffers (at least). Think about a website with tens/hundreds of pages, every page is in a separate .md file, and I'll be performing the same tasks with the same parameters across multiple buffers. ... I don't change the system messages frequently, but I do sometimes switch between those that I've defined.

I made all the model and request parameters (backend, model, system message, temperature, max tokens etc) global variables by default, so your needs are met. It also saves me the headache of explaining why you shouldn't use setq to set them -- about two issues a week involve confusion on this matter.

Here's a potential solution (for system messages only, not the model). I can make it so when you set/edit/choose the system message from the transient menu, you have a choice of setting it globally or buffer-locally. It's one extra keypress, but might be worth the trouble.

Hmm. I don't find that very appealing. The beauty and promise of gptel is that it might let people perform complex tasks with minimal interaction (as measured by the number of keypresses required to perform a common task).

This is what I did, but instead of explaining why there aren't more keypresses involved, I'll let you examine the menu yourself.


I'll respond to the other points (context, system message and directives) in detail after I've had time to think over them.

jwr commented 3 months ago

I also flattened the model selection process now so you don't have to pick the provider first, it's all together. So it should be faster.

I like it! It is indeed faster.

I made all the model and request parameters (backend, model, system message, temperature, max tokens etc) global variables by default, so your needs are met. It also saves me the headache of explaining why you shouldn't use setq to set them -- about two issues a week involve confusion on this matter.

🙂

Here's a potential solution (for system messages only, not the model). I can make it so when you set/edit/choose the system message from the transient menu, you have a choice of setting it globally or buffer-locally. It's one extra keypress, but might be worth the trouble.

Hmm. I don't find that very appealing. The beauty and promise of gptel is that it might let people perform complex tasks with minimal interaction (as measured by the number of keypresses required to perform a common task).

This is what I did, but instead of explaining why there aren't more keypresses involved, I'll let you examine the menu yourself.

Ok, I misunderstood what you wanted to implement. The current solution is really nice, works very well for me, and hopefully for everyone else, too. Thank you!

jwr commented 3 months ago

This issue ballooned to encompass many topics, but I had some feedback related to the initial feature request, and I thought I might as well add it here.

I've been switching between GPT-4 and Anthropic models a lot recently, and found out that Anthropic models like to get the context in the system message, but really need the specific directives in the user message. Directives (understood as "instructions on what to do specifically with the rest of the user input" appended to the system message simply do not work at all in some cases. GPT-4 seems to be less picky about that.

Which brings us back to the question:

Internally, should the additional directives be appended to the system prompt, or prefixed to the latest user message?

I would suggest this is a per-backend thing. For example, right now I've been looking for a way to modify gptel so that it prepends the directive before the text that gets sent in the user message, for Anthropic only. Even a binary configuration variable would help, but I guess a more configurable system will be needed in the future (a function?).

tusharhero commented 2 months ago

I would also like to add that forcing the user to send a prompt when they might not have written a prompt yet is very annoying: image I keep having to abort the gptel request immediately after changing some setting. It just feels unatural to write the prompt before modifying things like directive or model.

karthink commented 2 months ago

I would also like to add that forcing the user to send a prompt when they might not have written a prompt

@tusharhero I don't follow, why do you have to send the prompt after changing a setting?

tusharhero commented 2 months ago

Lets say I make a new GPTel buffer and I see that I don't want to use the current model, I run gptel-menu, change the model, image You hit enter, now it sends the prompt, and you now have to stop the request using gptel-abort, don't you think it would better to just not do anything when hitting enter, we already have gptel-send binded to C-c RET.

karthink commented 2 months ago

You hit enter, now it sends the prompt, and you now have to stop the request using gptel-abort,

Just quit the menu instead? You can quit with C-g (always) or q (if you've set transient-bind-q-to-quit).

tusharhero commented 2 months ago

it feels unintuitive, I expected that hitting enter would change the settings and nothing more. What is the reasoning behind having an extra way to send the prompt though?

karthink commented 2 months ago

gptel-send will always insert the response below the prompt, it's like the "send" button in a messaging application. For all alternate behavior, you have to use the transient menu.

karthink commented 2 months ago

@tusharhero IIUC, you want to use gptel-send quite differently from how it's intended. You'd like to

  1. Set gptel's options from the transient menu, and "confirm" your choices by pressing return.
  2. Call gptel-send to send a query with these options applied.
  3. You want these options to be persistent. For example, if you've set the output to be redirected to another buffer, you want this to happen every time you call gptel-send until you open the menu and clear this setting.

Is this correct?

tusharhero commented 2 months ago

Yes, that is correct @karthink .

karthink commented 2 months ago
  1. Set gptel's options from the transient menu, and "confirm" your choices by pressing return.
  2. Call gptel-send to send a query with these options applied.
  3. You want these options to be persistent. For example, if you've set the output to be redirected to another buffer, you want this to happen every time you call gptel-send until you open the menu and clear this setting.

@tusharhero

;; Command to save and quit gptel menu 
(transient-define-suffix gptel-transient-save-and-quit (&rest _)
  :key "RET"
  :description "Save options and exit"
  (interactive)
  (transient-set)
  (transient-quit-one))

(transient-suffix-put 'gptel-menu (kbd "RET") :command 'gptel-transient-save-and-quit)

;; gptel-send with saved transient options
(defun gptel-send-with-options (&optional arg)
  "Send query.  With prefix ARG open gptel's menu instead."
  (interactive "P")
  (if arg
      (call-interactively 'gptel-menu)
    (gptel--suffix-send (transient-args 'gptel-menu))))

(keymap-set gptel-mode-map "<remap> <gptel-send>" #'gptel-send-with-options)

Also see the wiki.

karthink commented 2 months ago

I made all the model and request parameters (backend, model, system message, temperature, max tokens etc) global variables by default, so your needs are met. It also saves me the headache of explaining why you shouldn't use setq to set them -- about two issues a week involve confusion on this matter.

Ok, I misunderstood what you wanted to implement. The current solution is really nice, works very well for me, and hopefully for everyone else, too. Thank you!

I've got my first issue now from a user confused about model parameters not being buffer-local by default now, so it looks like there's no winning here.

braineo commented 2 months ago

I made all the model and request parameters (backend, model, system message, temperature, max tokens etc) global variables by default, so your needs are met. It also saves me the headache of explaining why you shouldn't use setq to set them -- about two issues a week involve confusion on this matter.

Ok, I misunderstood what you wanted to implement. The current solution is really nice, works very well for me, and hopefully for everyone else, too. Thank you!

I've got my first issue now from a user confused about model parameters not being buffer-local by default now, so it looks like there's no winning here.

While it might seem expanding the scope for no reason. Would keeping the variables per “project” somehow balance the needs? Or simply let users define parameter presets and be able to quickly switch between.

jwr commented 2 months ago

I've got my first issue now from a user confused about model parameters not being buffer-local by default now, so it looks like there's no winning here.

Yes, I'm afraid so 🙂

While it won't make your life as a designer easier, I'll share some more thoughts after spending a couple of weeks using the updated interface.

Surprisingly, I find myself going to the web versions of ChatGPT and Anthropic developer console quite often. That's because they impose a minimal cognitive load: they are simple and predictable. Compared to that, pulling up the gptel menu consumes a lot of my thinking resources. I need to carefully go through all the options and check if this is indeed what I want. If I quit the gptel menu and re-open it a second later, some options will remain the same (like the model), and some will change (the directive will disappear). So, I need to go through them every time.

Then there are bugs or issues related to the complexity of gptel's input handling. What will get sent? Will it be the entire buffer? I need to worry about where the point is. And sometimes the entire buffer does not get sent, even though the point is at the end, so just to be sure, I now always set up a region and make sure that the point is at the end. I find myself longing for something more predictable and requiring less of my brainpower to use.

That unpredictability (and I do realize this is subjective), cognitive load, and the lack of flexibility of how the directive is handled (appending to the system prompt is not the best solution with the models I use) mean that I end up using the web versions quite often. Even copying/pasting to and from these web consoles is sometimes easier than figuring out exactly what will happen when I press RET.

This is not a complaint, just an observation. I don't have all the right answers. One thing that is clear in my case, though, is that making variables buffer-local would make the cognitive load much higher.

karthink commented 2 months ago

Surprisingly, I find myself going to the web versions of ChatGPT and Anthropic developer console quite often. That's because they impose a minimal cognitive load: they are simple and predictable. Compared to that, pulling up the gptel menu consumes a lot of my thinking resources. I need to carefully go through all the options and check if this is indeed what I want. If I quit the gptel menu and re-open it a second later, some options will remain the same (like the model), and some will change (the directive will disappear). So, I need to go through them every time.

I don't understand this. If you open a dedicated chat buffer with M-x gptel, set a system message and start typing, how is the experience different from the web? I haven't used the web interface in a while. As I remember it, there is no "additional directive" in the web interface. It functions like a shell, and the system message functions like an environment variable. Have things changed?

Then there are bugs or issues related to the complexity of gptel's input handling. What will get sent? Will it be the entire buffer? I need to worry about where the point is.

It's always the buffer up to point, or the region if it is active. Any other behavior is a bug.

And sometimes the entire buffer does not get sent, even though the point is at the end, so just to be sure, I now always set up a region and make sure that the point is at the end.

If the point is at the end of the buffer but the entire buffer isn't sent, it's a bug, not complexity related to gptel's input handling. It will be useful if you have a reproducible example.

(The only exception is if the user sets special options like gptel-org-branching-context or gptel Org properties. But all of these are disabled by default. It's on the user to track use of this.)

That unpredictability (and I do realize this is subjective), cognitive load, and the lack of flexibility of how the directive is handled (appending to the system prompt is not the best solution with the models I use) mean that I end up using the web versions quite often.

What is the equivalent of the "additional directive" when you use the web interface? I thought it works like a chat app, the only input besides the system message is whatever you type into the box.

Even copying/pasting to and from these web consoles is sometimes easier than figuring out exactly what will happen when I press RET.

I understand the cognitive load aspect in that there are many ways to use gptel. But if you want to use it like you do the web interface, it is intended to work exactly the same, with minimal cognitive load.

jwr commented 2 months ago

Then there are bugs or issues related to the complexity of gptel's input handling. What will get sent? Will it be the entire buffer? I need to worry about where the point is.

It's always the buffer up to point, or the region if it is active. Any other behavior is a bug.

Hmm. Doesn't gptel use text properties to annotate text and then use them for sending?

Here is an example. I am looking at my *scratch* buffer, where I've had some exchanges already. The point is at the end, there is no region:

SCR-20240419-nhdg

And here is what "Inspect JSON" shows:

SCR-20240419-nhlm
karthink commented 2 months ago

Here is an example. I am looking at my *scratch* buffer, where I've had some exchanges already. The point is at the end, there is no region:

<img width="1340" alt="SCR-20240419-nhdg" src="https://github.com/karthink/gptel/assets/38015/585ebc68-ce6a-48e0-ba8d-9a9fe5751a0c">

And here is what "Inspect JSON" shows: <img width="502" alt="SCR-20240419-nhlm" src="https://github.com/karthink/gptel/assets/38015/49efa6e1-7e34-4280-b9d4-79d2579291fb">

The Ollama API does not accept the full chat contents, only the latest prompt and the context vector that's a stateful encoding of the conversation history.

But the behavior should be exactly the same as a web UI: If you type below a response, the conversation should continue like a regular back-and-forth, with no ambiguity.

The context vector is missing from the payload here, which may be a bug. Can you tell me more about what you've done in the scratch buffer besides typing and calling gptel-send?

EDIT: I see in #279 that you've modified the processing to remove the context vector, that could be the problem.

jwr commented 2 months ago

I'm not sure if I explained this clearly. I am not working from a gptel buffer. I am working in my own text buffer, and I expected the full buffer contents to be sent, because there was no region and the point was at the end of the buffer. This is indeed what was happening, up to a certain point, and then (I'm not sure when this changed) when I checked the JSON output, only one line from my buffer was sent as the prompt, not the entire buffer.

Please look at the *scratch* buffer on the left side — it's not a gptel session. It contains a "dialog", but only in the sense that every gptel-send invocation sends the entire buffer.

This should have nothing to do with the context vector — I'm focusing strictly on buffer contents being (or not being) put into the prompt.

karthink commented 2 months ago

Please look at the *scratch* buffer on the left side — it's not a gptel session. It contains a "dialog", but only in the sense that every gptel-send invocation sends the entire buffer.

I understood your screenshot correctly.

I'm not sure if I explained this clearly. I am not working from a gptel buffer. I am working in my own text buffer, and I expected the full buffer contents to be sent, because there was no region and the point was at the end of the buffer. This is indeed what was happening, up to a certain point,

This is not the case. Only your latest prompt is sent when using Ollama, with the rest of the history encoded in the context vector (that you've disabled).

and then (I'm not sure when this changed) when I checked the JSON output, only one line from my buffer was sent as the prompt, not the entire buffer.

This is the expected behavior.

This should have nothing to do with the context vector — I'm focusing strictly on buffer contents being (or not being) put into the prompt.

It does not accept the buffer contents, only a single user prompt.


How you want to use Ollama is at odds with how the API works. The API is not stateless, you want it to be. This is not something gptel can fix.

I think we can sidestep this whole issue. Taking a look at the Ollama API again, they appear to have added an OpenAI style send-everything endpoint. When I have time I will check how backwards compatible this is (with older Ollama versions that gptel users might have installed) and switch gptel over to that.

jwr commented 2 months ago

I am very confused now. I don't understand what "my latest prompt" is, especially in the context of what you wrote before:

It's always the buffer up to point, or the region if it is active. Any other behavior is a bug.

Let's forget Ollama for a moment. Assuming for example I switch to one of the OpenAI models, looking at the screenshot I posted above, what gets sent? My guess would be the system prompt, then the entire buffer contents in the first user message, is that correct?

And if so, is that behavior different for a different provider?

jwr commented 2 months ago

I thought perhaps more screenshots will help me explain what I mean. Please take a look at these three screenshots. On the left is what I see, it's all the information I have and I'm trying to guess what gptel will do. On the right you can see three different behaviors, each one will give me very different results. But I have no way of knowing what will happen just by looking at my text buffer and gptel options.

What I really expect to happen is in the first screenshot. System prompt, then the contents of my buffer up to the point, no magic processing behind the scenes, no additional hidden context.

SCR-20240420-jsfw SCR-20240420-jrlt SCR-20240420-jtii

As to the Ollama API, perhaps I misunderstand, but I don't see a problem: you may (but do not have to) pass the "context" parameter. If you don't, only what you send is used. That is exactly what I'm trying to achieve. I do not expect a conversational chat interface in my text buffer, I am looking for a predictable tool for manipulating my text.

I hope this helps clarify what I'm trying to do 🙏

karthink commented 1 month ago

@jwr are you still using gptel, or did you move to the web interface/another package?

I have a little bit of time to work on gptel again, and wanted to check before addressing your comments in this thread (and #277, #291) .

jwr commented 1 month ago

I do use gptel, although rarely. I would like to use it, but it's too unpredictable and error-prone and requires too much attention from me at the moment (as described already). I started doing most of my daily work in Typing Mind, which has a simple chat interface for both OpenAI and Anthropic. It lets me define "AI agents" (basically, system prompts) and I use that. It's not great, but it works.

I have been waiting and hoping that you would find some time to consider my use cases, but I didn't want to pressure you or file too many issues — after all, you owe me absolutely nothing and I do appreciate that I'm getting an already excellent tool for free 🙂. I still hope to use gptel as my main tool in the future.

karthink commented 1 month ago

Cool. I'll go over your messages in the three threads carefully and respond over the next few days.

jwr commented 1 month ago

If I may suggest: I think individual responses are not the best use of your time. I hope I have managed to describe the use case (non-conversational use, expectation of predictable and consistent behavior, minimal brainpower required to operate). If you agree that this is a use case that gptel should address, that's great! If you think this is not a common use case and I should modify gptel myself, then you could choose to make gptel slightly easier to modify towards these goals. And if you disagree altogether and believe that this kind of usage is not what gptel is for, then a simple statement is enough as well!

I am of course happy to provide more feedback if needed.

tusharhero commented 1 month ago

@karthink , I would like if there was a option to select no directives by default.

karthink commented 1 month ago

If I may suggest: I think individual responses are not the best use of your time. I hope I have managed to describe the use case (non-conversational use, expectation of predictable and consistent behavior, minimal brainpower required to operate). If you agree that this is a use case that gptel should address, that's great!

Having thought about this now, there are a couple of more fundamental issues with gptel's response tracking that need to be addressed first. There is both a design issue and a technical limitation with Elisp that makes this tricky. I started a discussion (#321) to get some feedback on the design part.

If you think this is not a common use case and I should modify gptel myself, then you could choose to make gptel slightly easier to modify towards these goals. And if you disagree altogether and believe that this kind of usage is not what gptel is for, then a simple statement is enough as well!

Right now it's definitely possible to tweak a few gptel settings and get what you want, for all behaviors except adding the directive to the first user message (instead of the system message). I could explain how to do these things, but when I address the underlying issue with gptel's response tracking you'll have to do more busywork to adapt your tweaks. If you can wait a little longer I should have better solutions.

karthink commented 1 month ago

@karthink , I would like if there was a option to select no directives by default.

I don't know what you mean. You can just set a blank system message? And there is no default "additional directive" anyway.

tusharhero commented 1 month ago

It selects a directive by default? No?

jwr commented 1 month ago

Right now it's definitely possible to tweak a few gptel settings and get what you want, for all behaviors except adding the directive to the first user message (instead of the system message). I could explain how to do these things, but when I address the underlying issue with gptel's response tracking you'll have to do more busywork to adapt your tweaks. If you can wait a little longer I should have better solutions.

Thank you 🙏 — I can wait. In the meantime, I'll post a bit of feedback in #321.

karthink commented 1 month ago

It selects a directive by default? No?

There is always a system message, you can set it to be blank by customizing gptel-directives.

karthink commented 2 weeks ago

@tusharhero Did this comment address your issue?