Stateless design - Githubissues

minad commented 1 year ago

Inspired by #7 I had the idea that it would be great if gptel uses a "stateless" design. If I understood you correctly this is also how the gpt api (or generally llms) work, since you have to send everything again on every request. More precisely the idea is that you don't maintain any internal state in gptel and instead take everything from the current buffer.

The last n headings (and content below corresponding to the responses) are taken as prompt
The model parameters are stored in a :gpt: Org drawer and taken from there. The drawer will only be present when you change the parameters from the default. For markdown or other text formats you could use a source block at the very beginning instead, similar to the front matter of static website generators.
Model parameter tweaking could still be done via the neat transient UI. The changed parameters would just have to be stored in the drawer. See also #6.

This change would give you restarting chats for free. Using gptel would just mean to turn on the gptel-mode (local minor mode) in an already existing buffer.

karthink commented 1 year ago

More precisely the idea is that you don't maintain any internal state in gptel and instead take everything from the current buffer.

By "internal state", do you refer to the use of a text property to differentiate between queries (what you type) and responses (what ChatGPT generates)? Because otherwise gptel is already stateless. When gptel-send is invoked, it does a text-property-search-backward and builds the conversation history/context to send -- it does not maintain anything internally. How many past exchanges it searches for is controlled by one of the model parameters in the transient menu.

Your suggestion is to replace the text-property based differentiation of query/responses with markup. As I mention in the README, I did not want to make any assumptions about structure at the start since it's not clear if, for example, forcing the heading-content-heading-content structure makes sense. Right now I can have a conversation in a code buffer that looks like this:

# How do I parse arguments in a bash script? I want it to handle the arguments "-d" (that sets "download") and "-b". Respond with only bash code.

while getopts "d:b:" opt; do   # <-- response from ChatGPT
  case $opt in
    d)
      download=1
      shift
      ;;
    ... # code omitted.
  esac
done

# Now write a function to do task X, where...

Note: gptel-mode isn't even required for this, you can just type the comment and run gptel-send.

However there's no robust way to persist text-properties as metadata, so the above exchange cannot be resumed in a new Emacs session. I'm not sure if adding persistence is worth giving up "structure-less" interaction in any buffer.

Another alternative is to use markup-based conversations in dedicated gptel buffers (in Org or Markdown as you describe), and continue to use text-properties otherwise, but this makes the code messier and harder to maintain on the whole.

I don't have any strong opinions about this yet, I'm still experimenting to see what's possible/useful behavior! Let me know what you think.

minad commented 1 year ago

By "internal state", do you refer to the use of a text property to differentiate between queries (what you type) and responses (what ChatGPT generates)?

Yes, and also the use of variables to maintain the GPT parameters. I think it would be better to store them in the file too, such that the entire state of the conversation is stored as plain text. If you are running the conversation from a different buffer (as in your programming use case), the parameters could also be stored in file local variables.

Your suggestion is to replace the text-property based differentiation of query/responses with markup.

Essentially yes. But the markup should ideally be very lightweight. In Org, you could mark headers with a special :gptel: tag for example.

I'm not sure if adding persistence is worth giving up "structure-less" interaction in any buffer.

Ideally we would end up with something which is still structure-less (or as much structure-less as possible), while still supporting persistence. One should also note that gptel already makes some assumptions about the structure via the gptel-prompt-string.

Another alternative is to use markup-based conversations in dedicated gptel buffers (in Org or Markdown as you describe), and continue to use text-properties otherwise, but this makes the code messier and harder to maintain on the whole.

This is a route I wouldn't take. I would stick to the idea of using org/markdown/prog-mode buffers. I would also stick to the idea of staying mostly structure-less, but only to that extent which allows to eliminate other internal state (text properties and maybe parameters).

Using text-only is a powerful concept and also very Emacsy. But maybe it conflicts a little bit with the goal of creating a fully polished UI in the style of the browser or some apps. But I would rather take the plain text only approach, since I believe it just fits better into Emacs.

CyberShadow commented 1 year ago

I think it would be better to store them in the file too, such that the entire state of the conversation is stored as plain text.

+1, using Markdown front-matter for all parameters would be great. To avoid clutter the front matter could be hidden by default.

Your suggestion is to replace the text-property based differentiation of query/responses with markup.

+1, it would be nice if it was possible to continue conversations by saving the file and then opening it again.

Actually, it would be nice if all conversations were backed by a file on disk.

A few emergent properties would result from this:

It is possible to review and continue all conversations.
Packages which provide a persistent undo history, such as undo-tree, allow reviewing past branches of conversations in the same way that the web UI does.
Having the conversation in a file would allow easier integration with other (non-Emacs) tools.

karthink commented 1 year ago

@CyberShadow Storing and reading the chat parameters from front-matter in Markdown (or a property drawer in Org) is quite simple. However we also need to store the boundaries demarcating prompts and responses. Reading headings as prompts and the text body as responses is too limiting. You can't have a long prompt that includes a bulleted list of instructions to ChatGPT, for example. See @minad's point above about using some format that is as structure-less as possible. Do you have any ideas on how to do this?

CyberShadow commented 1 year ago

Yes. I agree that imposing typing overhead on users' prompts would be annoying, so I had the following syntax in mind:

Lines beginning with > are used for responses from the model. ("role":"assistant")
Lines beginning with # are used for system prompts. ("role":"system")
All other non-empty lines are used as user prompts. ("role":"user")
Empty lines delimit messages, except when both the above and below line are user lines, in which case they're just a \n\n.
Contiguous spans of lines formatted in the same way are grouped together as one messages item.

I think this gets us close to being able to represent with 100% fidelity all possible inputs to the API endpoint. A few corner cases are not representable (trailing newlines, or several consecutive messages items with the same "role":"user"), but I think this is acceptable. There's also ">" or "#" at the start of the line in user input, though if we really need that, that could be represented by space-stuffing as in RFC 3676.

For ease of use the major mode could implement some niceties which do not detract from the stateless design or fidelity of representation. For example, hitting Return while point is on a line which starts with > or # could prefix the new line with the same character.

Does that sound reasonable?

karthink commented 1 year ago

Lines beginning with > are used for responses from the model. ("role":"assistant")

The model's response can include code blocks, or other kinds of formatted output. Prepending a > to them destroys the markup.

Lines beginning with # are used for system prompts. ("role":"system")

These are comments in org-mode and headings in Markdown -- this means every comment/heading is interpreted as a system prompt?

Choosing other characters for these purposes will present similar conflicts, and possibly confuse users if they are unfamiliar, for example if we use ! or == at the beginning of a line to denote a system prompt. The user might also edit them in the course of using the document as a general-purpose Org/Markdown file.

My idea so far is to handle this internally and without imposing any markup or syntax, along the following lines. (I'm using markdown-mode as an example, a similar system will work for org-mode):

The front matter has a field, let's say "locations", that is a list of integers. Each integer is the value of (point) at the boundary between a prompt and a response.
When gptel-mode is turned on, we read the locations and start tracking them with markers or text-properties. At this point, yes, the system is no longer stateless.
When gptel-mode is turned off or when the file is written to disk, we update the locations list with the boundary information. All the state is confined to the file again.
Since we know what regions of the buffer are prompts and responses, gptel-mode can optionally use a visual (like subtle fontification) to indicate this to the user. We can also distinguish visually between prompts and pre-existing text that was not fed to ChatGPT this way.
With a slightly richer data structure than a list -- that is still not too ugly when serialized as front matter -- we can track the model that buffer content came from, such as a response from GPT-3 that was used as a prompt fed to DALL-E to produce an image.
Finally, using gptel in any buffer without turning on gptel-mode is possible (like right now) but there's no persistence.

Essentially: Instead of storing the state metadata separately in an auxiliary file or creating syntax and imbuing it with meaning, we store the metadata in the file itself as TOML-style front matter or as Org property drawers.

The advantages are that

the user is free to impose whatever structure they desire on the conversation. If they want to use headings for their prompts and section contents for the responses, that works. If they want a free-flowing conversation that works too.
You can have as much visual indication as you'd like. None, or differently colored backgrounds for prompts/responses, or something in between.
If you turn gptel-mode off it's just a regular Org file -- no special syntax/markup that you might mess up before you turn gptel-mode on again. Of course, you could mess up the front matter or property drawer, but this is less likely than editing prefixes at the start of lines, etc.

The disadvantages are that

when the buffer is modified as a result of interaction with ChatGPT the system is in an intermediate state that can be lost. This can be mitigated by updating the metadata in the front matter after each response from ChatGPT. This can be done so that the response text insertion and the metadata update are part of an amalgamated change as far as undo is concerned.
the front matter/property drawer will fill up with what looks like gibberish.

What do you think?

(@minad feel free to weigh in.)

CyberShadow commented 1 year ago

The model's response can include code blocks, or other kinds of formatted output. Prepending a > to them destroys the markup.

It should not.

Quoted paragraph
code block in quoted paragraph
Continuation of quoted paragraph

Source code for the above:

> Quoted paragraph
> 
> ```
> code block in quoted paragraph
> ```
>
> Continuation of quoted paragraph

These are comments in org-mode and headings in Markdown -- this means every comment/heading is interpreted as a system prompt?

Sorry, when would this be a problem? I don't think I've ever needed to type a Markdown heading or quote into GPT.

CyberShadow commented 1 year ago

What do you think?

I admit it would work. I can think of these minor points:

Personally, I would like to be able to type the syntax which defines who said what. Otherwise, it no longer feels like I'm working in a text-based format, but some kind of WYSIWYG editor with hidden state which I can't see or control (which would be somewhat true).
It would be no longer possible to copy fragments of conversations into other conversations using just pure text editing operations. It would have to be done in a way that preserves text properties, or manually "re-paint" text after pasting with gptel-specific commands. (To be fair here, copying from Emacs buffers to Emacs buffers does preserve properties.)
The format of the out-of-band metadata would only be understandable by gptel, which hinders interoperability with other software.
Updates to the out-of-band metadata may interfere with undo / undo-in-region in unpleasant ways.

If the main concern is conflict with user-typed quote blocks / headings, a more distinct prefix could be chosen, such as GPT> or GPT-SYSTEM:.

Does that make sense?

Alan-Chen99 commented 1 year ago

I'm current using file-name-handler-alist to save gptel to a file:


(setf (get 'gptel--system-message 'permanent-local) t)
(defun gptel-run-real-handler (operation &rest args)
    (let ((inhibit-file-name-handlers
              (cons #'gptel-file-handler
                  (and (eq inhibit-file-name-operation operation)
                      inhibit-file-name-handlers)))
             (inhibit-file-name-operation operation))
        (apply operation args)))
(defun gptel-insert-file-contents (filename &optional visit beg end replace)
    (with-undo-amalgamate
        (let (obj ans)
            ;; FIXME: honor replace == nil
            (delete-region (point-min) (point-max))
            (setq ans (gptel-run-real-handler 'insert-file-contents filename visit beg end replace))
            (goto-char (point-min))
            (setq obj (read (current-buffer)))
            (delete-region (point-min) (point-max))
            (mapc
                (lambda (x)
                    (let ((content (plist-get x :content)))
                        (pcase (plist-get x :role)
                            ("system" (setq-local gptel--system-message content))
                            ("user" (insert content))
                            ("assistant" (insert (propertize content 'gptel 'response))))))
                obj)
            ans)))
(defun string-trim-ignore-advice (str &rest _)
    str)
(defun gptel-write-region (_start _end filename &optional append visit lockname mustbenew)
    (when append (error "append not supported"))
    (let (ans gptel--num-messages-to-send obj)
        (save-excursion
            (save-restriction
                ;; FIXME: respect start + end
                (widen)
                (goto-char (point-max))
                (catch 'revert!
                    (atomic-change-group
                        (advice-add #'string-trim :override #'string-trim-ignore-advice)
                        (unwind-protect
                            (setq obj (gptel--create-prompt))
                            (advice-remove #'string-trim #'string-trim-ignore-advice))
                        (delete-region (point-min) (point-max))
                        ;; FIXME: pp settings ought to be set
                        (pp obj (current-buffer))
                        (setq ans
                            (gptel-run-real-handler 'write-region
                                (point-min) (point-max) filename
                                nil visit lockname mustbenew))
                        (throw 'revert! nil)))
                ans))))
(defun gptel-file-handler (operation &rest args)
    (cond ((eq operation 'insert-file-contents)
              (apply #'gptel-insert-file-contents args))
        ((eq operation 'write-region)
            (apply #'gptel-write-region args))
        (t (apply #'gptel-run-real-handler operation args))))
(add-to-list 'file-name-handler-alist
    (cons (rx ".gpt" eos)
        #'gptel-file-handler))

When I have a bit more time I can fix some things in this and make a PR

karthink commented 1 year ago

Support for saving and restoring state has been added for Org mode buffers. Saving a gptel buffer to disk will save gptel metadata as Org properties. Opening an Org file and turning on gptel-mode will cause the state (if available in the file) to be restored, and the conversation can be continued.

See also M-x gptel-set-topic, which can be used to limit a conversation context to an Org heading.

Support for Markdown mode is pending.

gptel remains stateful when the file is unsaved. I have yet to find a way to do this that does not involve adding additional syntax like a "Response: " prefix or heading. But gptel conversations in Org mode buffers can be saved to disk and resumed now.

karthink commented 1 year ago

Support for saving chats to Markdown/Text files has been added. The implementation isn't very satisfying compared to Org (I'm using file-local variables), but it works.

karthink commented 10 months ago

I have no actionable ideas at the moment to make gptel completely stateless (without adding syntax, which I don't want to do), so I am moving it to discussions for now.

karthink / gptel

Stateless design #17