Incomplete text from buffer sent in prompt

jwr commented 3 months ago

I had a hard time coming up with a title for this bug, and I don't know how to reproduce it. But it's there and it can be very confusing, because the LLMs will produce bizarre results, and with LLMs you never know if it's the model that went crazy or a real bug :-)

I'm looking at a buffer that contains:

(defun jwr/scratch-write-to-file (filename)
  (interactive "F")
  (with-current-buffer (get-buffer-create jwr/scratch-buffer-name)
    (write-region nil nil filename nil t nil t)))

The point is at the end.

I run gptel-menu and look at the lisp query:

(:model "mistral:7b-instruct-v0.2-q6_K" :system "Write a title for this text note, maximum of eight words, and provide a proposed filename. Respond with a JSON object containing keys \"title\" and \"filename\", with no additional text or commentary.
" :prompt "(write-region nil nil filename nil t nil t)))" :stream t :context
[733 16289 28793 995 460 264 10865 13892 28723 1992 19571 3078 11973 28725 1413 264 1863 28725 1639 840 304 13640 3238 28723 5938 3655 3254 25227 513 3236 28723 325 3035 28733 8164 2346 2346 4326 2346 261 2346 261 5429 13 13 5238 264 3941 354 272 2245 5039 2747 28725 7228 302 5435 3085 28725 304 3084 264 8278 4326 28723 1992 19571 395 264 9292 1928 8707 8148 345 3901 28739 304 345 9639 548 395

[...]

Note how :prompt does not contain the full buffer contents. Marking the entire buffer with a region and trying with a region does not change anything.

I would expect the entire buffer contents to be sent in :prompt.

(as a side note, I don't understand what those :context numbers are and if they get sent to the model or not. They only appear after an LLM has been queried at least once).

Update: if I reformat the contents like this:

(defun jwr/scratch-write-to-file (filename)
  (interactive "F")
  (with-current-buffer (get-buffer-create jwr/scratch-buffer-name)(write-region nil nil filename nil t nil t)))

and position the point on the opening parenthesis before the "write-region", I get "No user prompt found!".

Moving back one character results in :prompt "-create jwr/scratch-buffer-name"

My buffer is in markdown-mode.

jwr commented 3 months ago

Ok, having taken some time to read the code, I think I'm slightly less confused as to what might be going on. But here's a thought: I think I have different expectations when calling gptel from my buffer.

What I do expect is that gptel will do the same thing every time I call it, e.g. user my buffer/region contents along with the system prompt and directive to compose a request and do something with the results. I do not expect conversation awareness, context keeping, recognizing previous responses in the buffer, or really anything else. In fact, those things interfere with what I'm trying to do.

I think it would be great if gptel allowed me to use it in this simple "deterministic mode".

After reading the code, it seems that some of my confusion is apparently because gptel-menu is used for both setting options in gptel buffers and for using gptel in other buffers. The code is also shared. But perhaps those are really two different scenarios, with common options?

karthink commented 3 months ago

The confusion is due to a combination of two different behaviors.

If you receive a response from a model and edit it, your edits are not considered to be part of the response, they become an additional user prompt in between. The other way to handle this would be to consider edits to the response part of the response. I've gone back and forth between the two ways of handling this, there are valid reasons for wanting both behaviors.

What I do expect is that gptel will do the same thing every time I call it, e.g. user my buffer/region contents along with the system prompt and directive to compose a request and do something with the results. I do not expect conversation awareness, context keeping, recognizing previous responses in the buffer, or really anything else. In fact, those things interfere with what I'm trying to do.

This is not possible with Ollama. Unlike the OpenAI-inspired APIs, the Ollama API is stateful and works by passing a growing context vector back and forth along with the latest user prompt (and nothing else). Coupled with the previous point, you can see how only the last chunk is sent.

Ollama plans to offer, or perhaps already offers an OpenAI-compatible API. I haven't looked into how to access that yet, you can try using that (by creating a backend with gtpel-make-openai) for stateless behavior.

jwr commented 3 months ago

If you receive a response from a model and edit it, your edits are not considered to be part of the response, they become an additional user prompt in between. The other way to handle this would be to consider edits to the response part of the response. I've gone back and forth between the two ways of handling this, there are valid reasons for wanting both behaviors.

Hmm. I can see how this is useful, but I find myself encountering unexpected behavior (like what I reported above) and mistrusting gptel as a result. When working with my own buffers, I'd really like a "simple mode", where I can always guess what gptel will send as a query. Usually that would be the text I selected or the entire buffer contents, with the directive prepended (directives currently get appended to the system prompt which is also causing me problems).

To describe my use case: I currently mostly work with the *scratch* buffer (Markdown mode), where my directive is inserted at the top (because I can't get gptel to prepend it to the user prompt) and the text to operate on is below that. I have to extra careful as to where I position the point, and (as shown above) I run into unexpected problems sometimes if I generate back into the same *scratch* buffer. That means that instead of focusing on my work, I'm thinking about how to avoid mistakes and checking/re-checking the generated query every time, sometimes losing the settings if I forget to set/save them, because the dry run actions can reset them. And mistakes are costly with models like the Opus.

To be clear, I am extremely grateful for gptel, and the workflow is still much, much better than copy/pasting into the Anthropic console, but I wish I could rely on gptel more and think less 🙂

What I do expect is that gptel will do the same thing every time I call it, e.g. user my buffer/region contents along with the system prompt and directive to compose a request and do something with the results. I do not expect conversation awareness, context keeping, recognizing previous responses in the buffer, or really anything else. In fact, those things interfere with what I'm trying to do.

This is not possible with Ollama. Unlike the OpenAI-inspired APIs, the Ollama API is stateful and works by passing a growing context vector back and forth along with the latest user prompt (and nothing else). Coupled with the previous point, you can see how only the last chunk is sent.

Hmm. Ok, I thought it was possible to simply initiate an entirely new session with Ollama every time, with no context. Or drop the context and get a stateless API.

karthink commented 2 months ago

Closing this to focus the discussion about conversation context in #291. The issues with Ollama's API have been fixed.

karthink / gptel

Incomplete text from buffer sent in prompt #272