Closed jwr closed 2 months ago
Ok, having taken some time to read the code, I think I'm slightly less confused as to what might be going on. But here's a thought: I think I have different expectations when calling gptel from my buffer.
What I do expect is that gptel will do the same thing every time I call it, e.g. user my buffer/region contents along with the system prompt and directive to compose a request and do something with the results. I do not expect conversation awareness, context keeping, recognizing previous responses in the buffer, or really anything else. In fact, those things interfere with what I'm trying to do.
I think it would be great if gptel allowed me to use it in this simple "deterministic mode".
After reading the code, it seems that some of my confusion is apparently because gptel-menu is used for both setting options in gptel buffers and for using gptel in other buffers. The code is also shared. But perhaps those are really two different scenarios, with common options?
The confusion is due to a combination of two different behaviors.
If you receive a response from a model and edit it, your edits are not considered to be part of the response, they become an additional user prompt in between. The other way to handle this would be to consider edits to the response part of the response. I've gone back and forth between the two ways of handling this, there are valid reasons for wanting both behaviors.
What I do expect is that gptel will do the same thing every time I call it, e.g. user my buffer/region contents along with the system prompt and directive to compose a request and do something with the results. I do not expect conversation awareness, context keeping, recognizing previous responses in the buffer, or really anything else. In fact, those things interfere with what I'm trying to do.
This is not possible with Ollama. Unlike the OpenAI-inspired APIs, the Ollama API is stateful and works by passing a growing context vector back and forth along with the latest user prompt (and nothing else). Coupled with the previous point, you can see how only the last chunk is sent.
Ollama plans to offer, or perhaps already offers an OpenAI-compatible API. I haven't looked into how to access that yet, you can try using that (by creating a backend with gtpel-make-openai
) for stateless behavior.
If you receive a response from a model and edit it, your edits are not considered to be part of the response, they become an additional user prompt in between. The other way to handle this would be to consider edits to the response part of the response. I've gone back and forth between the two ways of handling this, there are valid reasons for wanting both behaviors.
Hmm. I can see how this is useful, but I find myself encountering unexpected behavior (like what I reported above) and mistrusting gptel as a result. When working with my own buffers, I'd really like a "simple mode", where I can always guess what gptel will send as a query. Usually that would be the text I selected or the entire buffer contents, with the directive prepended (directives currently get appended to the system prompt which is also causing me problems).
To describe my use case: I currently mostly work with the *scratch*
buffer (Markdown mode), where my directive is inserted at the top (because I can't get gptel to prepend it to the user prompt) and the text to operate on is below that. I have to extra careful as to where I position the point, and (as shown above) I run into unexpected problems sometimes if I generate back into the same *scratch*
buffer. That means that instead of focusing on my work, I'm thinking about how to avoid mistakes and checking/re-checking the generated query every time, sometimes losing the settings if I forget to set/save them, because the dry run actions can reset them. And mistakes are costly with models like the Opus.
To be clear, I am extremely grateful for gptel, and the workflow is still much, much better than copy/pasting into the Anthropic console, but I wish I could rely on gptel more and think less 🙂
What I do expect is that gptel will do the same thing every time I call it, e.g. user my buffer/region contents along with the system prompt and directive to compose a request and do something with the results. I do not expect conversation awareness, context keeping, recognizing previous responses in the buffer, or really anything else. In fact, those things interfere with what I'm trying to do.
This is not possible with Ollama. Unlike the OpenAI-inspired APIs, the Ollama API is stateful and works by passing a growing context vector back and forth along with the latest user prompt (and nothing else). Coupled with the previous point, you can see how only the last chunk is sent.
Hmm. Ok, I thought it was possible to simply initiate an entirely new session with Ollama every time, with no context. Or drop the context and get a stateless API.
Closing this to focus the discussion about conversation context in #291. The issues with Ollama's API have been fixed.
I had a hard time coming up with a title for this bug, and I don't know how to reproduce it. But it's there and it can be very confusing, because the LLMs will produce bizarre results, and with LLMs you never know if it's the model that went crazy or a real bug :-)
I'm looking at a buffer that contains:
The point is at the end.
I run
gptel-menu
and look at the lisp query:Note how
:prompt
does not contain the full buffer contents. Marking the entire buffer with a region and trying with a region does not change anything.I would expect the entire buffer contents to be sent in
:prompt
.(as a side note, I don't understand what those
:context
numbers are and if they get sent to the model or not. They only appear after an LLM has been queried at least once).Update: if I reformat the contents like this:
and position the point on the opening parenthesis before the "write-region", I get "No user prompt found!".
Moving back one character results in
:prompt "-create jwr/scratch-buffer-name"
My buffer is in markdown-mode.