karthink / gptel

A simple LLM client for Emacs
GNU General Public License v3.0
1.14k stars 119 forks source link

Stateless design #17

Closed minad closed 10 months ago

minad commented 1 year ago

Inspired by #7 I had the idea that it would be great if gptel uses a "stateless" design. If I understood you correctly this is also how the gpt api (or generally llms) work, since you have to send everything again on every request. More precisely the idea is that you don't maintain any internal state in gptel and instead take everything from the current buffer.

This change would give you restarting chats for free. Using gptel would just mean to turn on the gptel-mode (local minor mode) in an already existing buffer.

karthink commented 1 year ago

More precisely the idea is that you don't maintain any internal state in gptel and instead take everything from the current buffer.

By "internal state", do you refer to the use of a text property to differentiate between queries (what you type) and responses (what ChatGPT generates)? Because otherwise gptel is already stateless. When gptel-send is invoked, it does a text-property-search-backward and builds the conversation history/context to send -- it does not maintain anything internally. How many past exchanges it searches for is controlled by one of the model parameters in the transient menu.

Your suggestion is to replace the text-property based differentiation of query/responses with markup. As I mention in the README, I did not want to make any assumptions about structure at the start since it's not clear if, for example, forcing the heading-content-heading-content structure makes sense. Right now I can have a conversation in a code buffer that looks like this:

# How do I parse arguments in a bash script? I want it to handle the arguments "-d" (that sets "download") and "-b". Respond with only bash code.

while getopts "d:b:" opt; do   # <-- response from ChatGPT
  case $opt in
    d)
      download=1
      shift
      ;;
    ... # code omitted.
  esac
done

# Now write a function to do task X, where...

Note: gptel-mode isn't even required for this, you can just type the comment and run gptel-send.

However there's no robust way to persist text-properties as metadata, so the above exchange cannot be resumed in a new Emacs session. I'm not sure if adding persistence is worth giving up "structure-less" interaction in any buffer.

Another alternative is to use markup-based conversations in dedicated gptel buffers (in Org or Markdown as you describe), and continue to use text-properties otherwise, but this makes the code messier and harder to maintain on the whole.

I don't have any strong opinions about this yet, I'm still experimenting to see what's possible/useful behavior! Let me know what you think.

minad commented 1 year ago

By "internal state", do you refer to the use of a text property to differentiate between queries (what you type) and responses (what ChatGPT generates)?

Yes, and also the use of variables to maintain the GPT parameters. I think it would be better to store them in the file too, such that the entire state of the conversation is stored as plain text. If you are running the conversation from a different buffer (as in your programming use case), the parameters could also be stored in file local variables.

Your suggestion is to replace the text-property based differentiation of query/responses with markup.

Essentially yes. But the markup should ideally be very lightweight. In Org, you could mark headers with a special :gptel: tag for example.

I'm not sure if adding persistence is worth giving up "structure-less" interaction in any buffer.

Ideally we would end up with something which is still structure-less (or as much structure-less as possible), while still supporting persistence. One should also note that gptel already makes some assumptions about the structure via the gptel-prompt-string.

Another alternative is to use markup-based conversations in dedicated gptel buffers (in Org or Markdown as you describe), and continue to use text-properties otherwise, but this makes the code messier and harder to maintain on the whole.

This is a route I wouldn't take. I would stick to the idea of using org/markdown/prog-mode buffers. I would also stick to the idea of staying mostly structure-less, but only to that extent which allows to eliminate other internal state (text properties and maybe parameters).

Using text-only is a powerful concept and also very Emacsy. But maybe it conflicts a little bit with the goal of creating a fully polished UI in the style of the browser or some apps. But I would rather take the plain text only approach, since I believe it just fits better into Emacs.

CyberShadow commented 1 year ago

I think it would be better to store them in the file too, such that the entire state of the conversation is stored as plain text.

+1, using Markdown front-matter for all parameters would be great. To avoid clutter the front matter could be hidden by default.

Your suggestion is to replace the text-property based differentiation of query/responses with markup.

+1, it would be nice if it was possible to continue conversations by saving the file and then opening it again.

Actually, it would be nice if all conversations were backed by a file on disk.

A few emergent properties would result from this:

karthink commented 1 year ago

@CyberShadow Storing and reading the chat parameters from front-matter in Markdown (or a property drawer in Org) is quite simple. However we also need to store the boundaries demarcating prompts and responses. Reading headings as prompts and the text body as responses is too limiting. You can't have a long prompt that includes a bulleted list of instructions to ChatGPT, for example. See @minad's point above about using some format that is as structure-less as possible. Do you have any ideas on how to do this?

CyberShadow commented 1 year ago

Yes. I agree that imposing typing overhead on users' prompts would be annoying, so I had the following syntax in mind:

I think this gets us close to being able to represent with 100% fidelity all possible inputs to the API endpoint. A few corner cases are not representable (trailing newlines, or several consecutive messages items with the same "role":"user"), but I think this is acceptable. There's also ">" or "#" at the start of the line in user input, though if we really need that, that could be represented by space-stuffing as in RFC 3676.

For ease of use the major mode could implement some niceties which do not detract from the stateless design or fidelity of representation. For example, hitting Return while point is on a line which starts with > or # could prefix the new line with the same character.

Does that sound reasonable?

karthink commented 1 year ago

Lines beginning with > are used for responses from the model. ("role":"assistant")

The model's response can include code blocks, or other kinds of formatted output. Prepending a > to them destroys the markup.

Lines beginning with # are used for system prompts. ("role":"system")

These are comments in org-mode and headings in Markdown -- this means every comment/heading is interpreted as a system prompt?

Choosing other characters for these purposes will present similar conflicts, and possibly confuse users if they are unfamiliar, for example if we use ! or == at the beginning of a line to denote a system prompt. The user might also edit them in the course of using the document as a general-purpose Org/Markdown file.

My idea so far is to handle this internally and without imposing any markup or syntax, along the following lines. (I'm using markdown-mode as an example, a similar system will work for org-mode):

  1. The front matter has a field, let's say "locations", that is a list of integers. Each integer is the value of (point) at the boundary between a prompt and a response.
  2. When gptel-mode is turned on, we read the locations and start tracking them with markers or text-properties. At this point, yes, the system is no longer stateless.
  3. When gptel-mode is turned off or when the file is written to disk, we update the locations list with the boundary information. All the state is confined to the file again.
  4. Since we know what regions of the buffer are prompts and responses, gptel-mode can optionally use a visual (like subtle fontification) to indicate this to the user. We can also distinguish visually between prompts and pre-existing text that was not fed to ChatGPT this way.
  5. With a slightly richer data structure than a list -- that is still not too ugly when serialized as front matter -- we can track the model that buffer content came from, such as a response from GPT-3 that was used as a prompt fed to DALL-E to produce an image.
  6. Finally, using gptel in any buffer without turning on gptel-mode is possible (like right now) but there's no persistence.

Essentially: Instead of storing the state metadata separately in an auxiliary file or creating syntax and imbuing it with meaning, we store the metadata in the file itself as TOML-style front matter or as Org property drawers.

The advantages are that

The disadvantages are that

What do you think?

(@minad feel free to weigh in.)

CyberShadow commented 1 year ago

The model's response can include code blocks, or other kinds of formatted output. Prepending a > to them destroys the markup.

It should not.

Quoted paragraph

code block in quoted paragraph

Continuation of quoted paragraph

Source code for the above:

> Quoted paragraph
> 
> ```
> code block in quoted paragraph
> ```
>
> Continuation of quoted paragraph

These are comments in org-mode and headings in Markdown -- this means every comment/heading is interpreted as a system prompt?

Sorry, when would this be a problem? I don't think I've ever needed to type a Markdown heading or quote into GPT.

CyberShadow commented 1 year ago

What do you think?

I admit it would work. I can think of these minor points:

If the main concern is conflict with user-typed quote blocks / headings, a more distinct prefix could be chosen, such as GPT> or GPT-SYSTEM:.

Does that make sense?

Alan-Chen99 commented 1 year ago

I'm current using file-name-handler-alist to save gptel to a file:


(setf (get 'gptel--system-message 'permanent-local) t)
(defun gptel-run-real-handler (operation &rest args)
    (let ((inhibit-file-name-handlers
              (cons #'gptel-file-handler
                  (and (eq inhibit-file-name-operation operation)
                      inhibit-file-name-handlers)))
             (inhibit-file-name-operation operation))
        (apply operation args)))
(defun gptel-insert-file-contents (filename &optional visit beg end replace)
    (with-undo-amalgamate
        (let (obj ans)
            ;; FIXME: honor replace == nil
            (delete-region (point-min) (point-max))
            (setq ans (gptel-run-real-handler 'insert-file-contents filename visit beg end replace))
            (goto-char (point-min))
            (setq obj (read (current-buffer)))
            (delete-region (point-min) (point-max))
            (mapc
                (lambda (x)
                    (let ((content (plist-get x :content)))
                        (pcase (plist-get x :role)
                            ("system" (setq-local gptel--system-message content))
                            ("user" (insert content))
                            ("assistant" (insert (propertize content 'gptel 'response))))))
                obj)
            ans)))
(defun string-trim-ignore-advice (str &rest _)
    str)
(defun gptel-write-region (_start _end filename &optional append visit lockname mustbenew)
    (when append (error "append not supported"))
    (let (ans gptel--num-messages-to-send obj)
        (save-excursion
            (save-restriction
                ;; FIXME: respect start + end
                (widen)
                (goto-char (point-max))
                (catch 'revert!
                    (atomic-change-group
                        (advice-add #'string-trim :override #'string-trim-ignore-advice)
                        (unwind-protect
                            (setq obj (gptel--create-prompt))
                            (advice-remove #'string-trim #'string-trim-ignore-advice))
                        (delete-region (point-min) (point-max))
                        ;; FIXME: pp settings ought to be set
                        (pp obj (current-buffer))
                        (setq ans
                            (gptel-run-real-handler 'write-region
                                (point-min) (point-max) filename
                                nil visit lockname mustbenew))
                        (throw 'revert! nil)))
                ans))))
(defun gptel-file-handler (operation &rest args)
    (cond ((eq operation 'insert-file-contents)
              (apply #'gptel-insert-file-contents args))
        ((eq operation 'write-region)
            (apply #'gptel-write-region args))
        (t (apply #'gptel-run-real-handler operation args))))
(add-to-list 'file-name-handler-alist
    (cons (rx ".gpt" eos)
        #'gptel-file-handler))

When I have a bit more time I can fix some things in this and make a PR

karthink commented 1 year ago

Support for saving and restoring state has been added for Org mode buffers. Saving a gptel buffer to disk will save gptel metadata as Org properties. Opening an Org file and turning on gptel-mode will cause the state (if available in the file) to be restored, and the conversation can be continued.

See also M-x gptel-set-topic, which can be used to limit a conversation context to an Org heading.

Support for Markdown mode is pending.

gptel remains stateful when the file is unsaved. I have yet to find a way to do this that does not involve adding additional syntax like a "Response: " prefix or heading. But gptel conversations in Org mode buffers can be saved to disk and resumed now.

karthink commented 1 year ago

Support for saving chats to Markdown/Text files has been added. The implementation isn't very satisfying compared to Org (I'm using file-local variables), but it works.

karthink commented 10 months ago

I have no actionable ideas at the moment to make gptel completely stateless (without adding syntax, which I don't want to do), so I am moving it to discussions for now.