Request for comment: Use Org mode properties drawer to configure system prompt/temp for the whole subtree

karthink / gptel

A simple LLM client for Emacs

GNU General Public License v3.0

1.04k stars 113 forks source link

Request for comment: Use Org mode properties drawer to configure system prompt/temp for the whole subtree #141

Closed doctorguile closed 3 months ago

doctorguile commented 7 months ago

I ran into the same bug as https://github.com/karthink/gptel/issues/135

But ultimately even if I didn't run into this bug, I think the transient menu is not as convenient if you have a big list of predefined combos.

For an example of very convenient way to have access to a set of user defined system prompt/model/temperature, etc see

https://github.com/Bin-Huang/chatbox

where you can set a system prompt , temperature and model for each chat thread, and continue on that thread without having to repeat the context (sys prompt, etc)

the ideal way to do this in emacs is in org mode, having a header and set system prompt , temperature and model as property drawer.

then all subtree of that header will inherit these properties and you can have a lot of useful prompt / temperature always ready to go

example

* Emacs
:PROPERTIES:
:SYSPROMPT: You are a emacs expert. You can help me by answering my questions. You can also ask me questions to clarify my intention.
:temperature: 0.1
:model: GPT4
:END:

** How to call REST api?
conversation re emacs continue in this subtree...

* Python
:PROPERTIES:
:SYSPROMPT: you are a python expert. respond to the task with detailed step by step instructions and code. list out all the files to be created and how to put them together
:temperature: 0.2
:model: GPT3.5
:END:

** How to setup a webserver with http3 support?
conversation re python continue in this subtree...

Does anyone think this is a good addition? This is how org is used (inheriting parent props) for a lot of other use cases

karthink commented 7 months ago

This works on a per-file basis in Org-mode right now when you save the buffer. (Except for Ollama models, where it's more annoying to capture state.)

I'm not convinced it's worth adding per-heading support, but I'll think about it.

NightMachinery commented 6 months ago

@karthink I think it's worth adding. I like to have a mega file where I store all my chats, and each chat is a new heading. Opening a new file for small chats is not worth it.

I also put chats under headings where they make sense. For example, I have a heading How to do X and I have a subheading LLM Chatlog and under that GPT4 and under this heading I have the actual chat. Linking to a file instead makes things messy.

Ypot commented 6 months ago

Take a look at "inline-tasks". That would allow to change the sysprompt by adding an "isolated" subheading, which means that you can end the body of the headline using a: "*** END".

I agree with the inheritance.

An example, in an old suggestion: https://github.com/karthink/gptel/issues/103#issuecomment-1685196575

karthink commented 6 months ago

I also put chats under headings where they make sense. For example, I have a heading How to do X and I have a subheading LLM Chatlog and under that GPT4 and under this heading I have the actual chat. Linking to a file instead makes things messy.

This sounds good to me, but I'm not sure I understand what you mean fully. You can already have an LLM Chatlog > GPT4 subheading, and store/continue the conversation there, if you have added a GPTEL_TOPIC property to the heading. (With org-set-property or more easily with gptel-set-topic)

What's missing here is the specification of the model and backend parameters for this specific chat. So if these are not the default values, you would have to change them manually from elisp or from the transient menu. If other chat logs in the same file use different models/backends, you'd have to switch manually each time before continuing those chats. Is this the problem you want to solve?

NightMachinery commented 6 months ago

I also put chats under headings where they make sense. For example, I have a heading How to do X and I have a subheading LLM Chatlog and under that GPT4 and under this heading I have the actual chat. Linking to a file instead makes things messy.

This sounds good to me, but I'm not sure I understand what you mean fully. You can already have an LLM Chatlog > GPT4 subheading, and store/continue the conversation there, if you have added a GPTEL_TOPIC property to the heading. (With org-set-property or more easily with gptel-set-topic)

What's missing here is the specification of the model and backend parameters for this specific chat. So if these are not the default values, you would have to change them manually from elisp or from the transient menu. If other chat logs in the same file use different models/backends, you'd have to switch manually each time before continuing those chats. Is this the problem you want to solve?

Yes, I like to be able to set all the parameters using property drawers. I want the chats to record all their inputs in the text file, so that years later, I can revisit them and know exactly what parameters etc. I used.

Another problem is that the prefix (gptel-prompt-prefix-alist) needs to be hardcoded to a single string, e.g., ***. Setting this to a regex, e.g., ^\*+\s+ seems more sensible. This would allow us to put the chats in any heading depth.

karthink commented 6 months ago

Yes, I like to be able to set all the parameters using property drawers.

There are some ambiguities in the behavior of gptel that I need to sort out to do this, I'll explain soon.

I want the chats to record all their inputs in the text file, so that years later, I can revisit them and know exactly what parameters etc. I used.

This (archival) is a different -- and simpler -- issue from being able to set the parameters. You can do this right now with a little elisp:

(org-entry-put nil "GPTEL_TEMPERATURE" gptel-temperature)
(org-entry-put nil "GPTEL_MODEL" gptel-model)
(org-entry-put nil "GPTEL_BACKEND" gptel-backend)

and so on, placed in a function in before-save-hook perhaps. (gptel-backend is a semi-opaque structure so you might prefer to write its fields as properties instead.)

Another problem is that the prefix (gptel-prompt-prefix-alist) needs to be hardcoded to a single string, e.g., ** . Setting this to a regex, e.g., ^\+\s+ seems more sensible. This would allow us to put the chats in any heading depth.

I don't see how a regexp can work. What is the corresponding string that will be inserted?

Note that the prompt prefix doesn't have to be a heading at all, it can be a string like "Prompt:" (with the corresponding response prefix as "Response:\n") and you can have chats at any heading depth.

karthink commented 6 months ago

where you can set a system prompt , temperature and model for each chat thread, and continue on that thread without having to repeat the context (sys prompt, etc)

the ideal way to do this in emacs is in org mode, having a header and set system prompt , temperature and model as property drawer.

Okay, I gave this some thought, and there are many subtle ambiguities in the exact behavior of gptel in the presence of the suggested org properties that I need to resolve before I can add it.

To explain these ambiguities, here is some background:

gptel has slightly different behavior depending on whether gptel-mode is turned on. The main differences are the visual indicators (header bar or mode-line) and saving the state of the chat to disk when writing the buffer to disk.

Presently ambiguous or undesirable behaviors:

With gptel-mode turned on, the user writes an org buffer to disk. The prompt/response boundaries don't depend on the backend and will always be written at (point-min). Under which heading should the other model parameters be written?
Should the behavior be the same irrespective of the state of gptel-mode? If yes, this will mean always searching for Org properties up the document tree before sending a request. Even with an org-element cache this is a small burden on the users who use gptel quite simply in Org mode, which is the majority. Without org-element ~~caching~~ parsing the buffer can be a lot of extra work.
Suppose the user redirects the prompt/response using the transient menu, such as reading from and echoing the response to the minibuffer. This interaction is basically independent of the current buffer. Should the model parameters be read from the local Org properties? How about when using partial buffer text as a prompt but redirecting the response to a new gptel session? How about when selecting a region that spans two Org headings with two different sets of specified model parameters?
Model parameters are specified as Org properties at some heading level in the document. The user uses the transient menu to change the active backend/model. What takes priority? If the values set from the menu are ignored in favor of the org properties, how do you avoid confusion when the response isn't what the user expects? Consider that the properties might be set several levels above where the user is currently typing. This essentially introduces a second, independent and overriding way of setting model parameters to gptel.

There are a couple more ambiguities to resolve, but I'd like to get some input on the above to find the least-surprising behavior.

NightMachinery commented 6 months ago

@karthink

What is the corresponding string that will be inserted?

Nothing. I can easily create the new heading using org-mode hotkeys itself, there is no need to insert it automatically.

Note that the prompt prefix doesn't have to be a heading at all, it can be a string like "Prompt:" (with the corresponding response prefix as "Response:\n") and you can have chats at any heading depth.

This is a good workaround, but I prefer headings, as they integrate into org naturally.

Plus, in the future, headings allow for some fancy features. E.g., we can use tags to exclude a subtree from the conversation history. We can use the natural branching of the org headings to create branching conversations. (By branching, I mean like when you edit a message in ChatGPT’s website and the UI creates a new branch of the conversation.)

karthink commented 6 months ago

Take a look at "inline-tasks". That would allow to change the sysprompt by adding an "isolated" subheading, which means that you can end the body of the headline using a: "*** END".

My understanding from talking to Org maintainers is that inline tasks should be avoided whenever possible -- they're having some trouble with parsing that syntax reliably. This is similar to how the preferred LaTeX math delimiters in Org are $ and $ and how $...$ should be avoided, even though it's allowed. So I'm inclined to find other solutions.

karthink commented 6 months ago

Nothing. I can easily create the new heading using org-mode hotkeys itself, there is no need to insert it automatically.

Then you can (setf (alist-get 'org-mode gptel-prompt-prefix-alist nil t) nil), i.e. don't use any prefix.

Plus, in the future, headings allow for some fancy features. E.g., we can use tags to exclude a subtree from the conversation history.

This is a good idea, but I'd prefer not to add more special behavior, like tags with special meanings for gptel. Perhaps we can reuse a state that already has special meaning in Org, like excluding a subtree if it's commented (as in * COMMENT Some title).

We can use the natural branching of the org headings to create branching conversations. (By branching, I mean like when you edit a message in ChatGPT’s website and the UI creates a new branch of the conversation.)

I already do this often with gptel-set-topic.

EDIT: I might have misunderstood what you meant by "branching conversations". When in a branch, I'm guessing the context from before the branch point is included in the conversation. Setting GPTEL_TOPIC simply creates a new conversation starting at that heading, so it's not what you suggested.

Ypot commented 6 months ago

Take a look at "inline-tasks". That would allow to change the sysprompt by adding an "isolated" subheading, which means that you can end the body of the headline using a: "*** END".

My understanding from talking to Org maintainers is that inline tasks should be avoided whenever possible -- they're having some trouble with parsing that syntax reliably. This is similar to how the preferred LaTeX math delimiters in Org are $ and $ and how $...$ should be avoided, even though it's allowed. So I'm inclined to find other solutions.

Hi! Inline tasks are quite popular between advanced users and they will stay. What is in discussion is their syntax, a new one is being in discussion: https://lists.gnu.org/archive/html/emacs-orgmode/2023-08/msg00708.html

But if you agree inline tasks could be the perfect tool, it would be necessary in the future just adjusting to the new syntax.

doctorguile commented 6 months ago

Hi there,

I'm a heavy user of gpt and I submitted a PR to another package org-ai with this very same idea

https://github.com/rksm/org-ai/pull/99

gptel is extremely flexible, but when I'm working with gpt i need very precise control over the system prompt and temperature (and in the future, function calling, etc)

I'm very used to Org mode inheritance properties workflow for a whole lot of tasks and it is ideal for this use case. (and not having a to repeat these settings is nice)

This really shows lisp is a curse because it is so easy to craft a solution that fits your personal needs 99%, and now we have 20+ elisp packages for the gpt related function :)

But still, I wish one or two solutions will emerge as the dominate project, so it will have a longer and stable life span, that's why I'm requesting for comments here. Even though I'm not using gptel as my main for the above reason, I wish it could have the best overall feature set.

Thanks

NightMachinery commented 6 months ago

@doctorguile I am currently using a yas-snippet that expands into:

#+begin_src jupyter-python :kernel py_base :session chatgpt_1 :async yes :exports both
res = openai_chat_complete(
    ##
    # model="gpt-4-0314",
    # model="gpt-4-0613",
    model="gpt-4-turbo",
    # model="gpt-3.5-turbo",
    ##
    messages=[
        # {"role": "system", "content": """You are a senior programmer. You are an expert in functional programming. You use design patterns as appropriate."""},
        {"role": "user", "content": r"""

        """},
    ],
    temperature=0,
    interactive=True,
)

print_chat_streaming(res, copy_mode="chat2")
#+end_src

I have

import openai
import pynight.common_openai 
from pynight.common_openai import (
    openai_key_get,
    setup_openai_key,
    print_chat_streaming,
    openai_chat_complete,
)

pynight.common_openai.openai_key = "..."
openai.api_key = pynight.common_openai.openai_key

in my IPython startup files.

Together with these two elisp functions which allow me to easily copy the answer as code to be added to the source cell:

(defun night/org-babel-result-get ()
  "Return the result of the current source block as a string.
Assumes that the source block has already been executed."
  (interactive)
  (save-excursion
    (let ((result-beg (org-babel-where-is-src-block-result))
          result-end)
      (unless result-beg

        (error "No result found for the current source block"))
      (goto-char result-beg)
      (setq result-end (org-babel-result-end))
      (let* ((raw-result (buffer-substring-no-properties result-beg result-end))
             (result (string-trim raw-result))
             (lines (split-string result "\n"))
             (first-relevant-line-index
              (cl-position-if-not

               (lambda (line)
                 (string-match-p "^\\(#\\+\\|:RESULTS:[ \t\n]*\\)" line))
               lines))
             (last-relevant-line-index
              (cl-position-if
               (lambda (line)
                 (not
                  (string-match-p "^\\(: ----+\\|:END:\\)[ \t]*$" line)))
               lines :from-end t))
             (result
              (mapconcat
               'identity
               (cl-subseq
                lines
                (or first-relevant-line-index 0)
                (cond
                 (last-relevant-line-index
                  (+ 1 last-relevant-line-index))
                 (t
                  (+ 0 (length lines)))))
               "\n"))
             (result (replace-regexp-in-string "^\\(: \\|:$\\)" "" result t t))

             (result (string-trim result)))
        ;; (message "first: %s, last: %s, lines: %s" first-relevant-line-index last-relevant-line-index lines)
        (when (called-interactively-p)
          (kill-new result)
          (message "%s" result))
        result))))

(defun night/org-babel-copy-as-chat ()
  "Copies the result section of the current source block as the last message in an LLM chat."
  (interactive)
  (let* (
         (last-msg (night/org-babel-result-get))
         (assistant
          (concat
           "        {\"role\": \"assistant\", \"content\": r\"\"\""
           last-msg
           "\"\"\"},"))
         (chat
          (concat
           assistant
           "\n        {\"role\": \"user\", \"content\": r\"\"\"\n        \n        \"\"\"},")))
    (kill-new chat)))

I am finding this solution the most extensible, and it's robust.

karthink commented 6 months ago

@doctorguile I'm not opposed to using org properties in principle, but I need to resolve the behavioral ambiguities mentioned above before I can add it. There are also performance considerations with scanning the parse tree for each request. (This is assuming org-element-use-cache is t. Otherwise we'd be scanning most of the buffer for each request.)

This really shows lisp is a curse because it is so easy to craft a solution that fits your personal needs 99%, and now we have 20+ elisp packages for the gpt related function :)

This isn't a problem as I see it. gptel is opinionated about its interface and not to the taste of someone who prefers more structured interaction, for instance. They've got chatgpt-shell, org-ai and many more to choose from.

The alternative is a dominant solution that fits everyone's personal needs at 66%, and no one's really satisfied with the experience.

karthink commented 4 months ago

But still, I wish one or two solutions will emerge as the dominate project, so it will have a longer and stable life span, that's why I'm requesting for comments here. Even though I'm not using gptel as my main for the above reason, I wish it could have the best overall feature set.

I finally had some time to think about this feature, so I've added support for "stateless" gptel configuration via Org properties. It's on the feature-org branch, I plan to merge it soon, depending on the feedback I get.

Here's how it works:

Anywhere in an Org buffer, you can call gptel-org-set-config to write the current configuration (system message, backend and model, temperature and max-tokens) as Org properties in the current heading. Alternatively, you can write these properties yourself using org-set-property. They're all named GPTEL_*.
From this point on, all use of gptel from under that heading will use these properties instead of the values defined using the transient menu or via your global configuration. Note that bringing up the transient menu will show the buffer-local, not heading-local values. I might make the menu more responsive in the future.
Whether these properties are inherited by subheadings depends on your Org configuration. I'm not doing anything special, so it will respect your value of org-use-property-inheritance.
There are two affordances available for setting context: (preexisting) you can call gptel-org-set-topic to make a heading a separate conversation, or (new) set gptel-org-branching-context to t to make each same-level heading a different branch of the conversation. (See its doc string for an example.)

No command/variable names are final.

The load order is a little funky since I try to avoid loading Org in gptel. Please let me know if it builds correctly for you. I would also appreciate everyone's feedback on the feature(s) -- again, it's on the feature-org branch.

doctorguile commented 4 months ago

Hi Karthik,

I just tried out your feature-org branch.

I tried to create a few org headers, create a thread, duplicate the thread to go a different direction, etc.

It works so far, very impressed.

Using text properties without explicit text delimiter is very clever and make the output clean.

However, I keep wondering if there's any error when I save the buffer seeing the GPTEL_BOUNDS updated at the top of the org file as I can't easily eyeball and verify.

I am sure someone might have asked before, is there a way to make the AI responded text rendered in a different shades of color so we can visually see the boundary of the user input / ai response?

Thanks!

karthink commented 4 months ago

I am sure someone might have asked before, is there a way to make the AI responded text rendered in a different shades of color so we can visually see the boundary of the user input / ai response?

There's nothing in gptel for this, but it would be very easy to add something. Here's an example:

BEFORE: 2024-03-05-202148_827x924_scrot

AFTER: 2024-03-05-201922_827x924_scrot

And here's the code if you want to try it:

(defface gptel-response-face '((t :inherit mode-line-highlight))
  "Face used to highlight LLM responses by gptel")

;; Emacs 28.1+ only, set to nil if you're on Emacs 27 or lower
(defvar gptel-use-response-divider t
  "Whether LLM responses should be demarcated with lines.")

;; Highlight response region
(defun gptel-colorize-region (beg end)
  (message "beg and end are %S:%S" beg end)
  (when (and gptel-mode beg end)
    (add-text-properties
     beg end
     '(font-lock-face (:inherit gptel-response-face :extend t)))
    (when gptel-use-response-divider
      (put-text-property
       (1- beg) beg 'display (concat "\n" (make-separator-line)))
      (unless (>= end (point-max))
        (put-text-property
         end (1+ end) 'display (concat "\n" (make-separator-line)))))))

;; Auto-highlight after inserting a response
(add-hook 'gptel-post-response-functions #'gptel-colorize-region)

;; Highlight from metadata on enabling gptel-mode
;; Org mode only, can write a more general version if necessary

(defun gptel-org-colorize-buffer ()
  (when (and (derived-mode-p 'org-mode) gptel-mode)
    (condition-case-unless-debug val
        (save-restriction
          (when-let ((bounds (org-entry-get (point-min) "GPTEL_BOUNDS")))
            (mapc (pcase-lambda (`(,beg . ,end)) (gptel-colorize-region beg end))
                  (read bounds))))
      (error "Coloring gptel responses failed with error: %S" val))))

(add-hook 'gptel-mode-hook #'gptel-org-colorize-buffer)

Let me know what you think.

doctorguile commented 4 months ago

that was awesome. completely addressed the visualization issue, you rock! (probably useful to others, worthy to add to wiki/readme)

btw, what do you think of adding image query support? i see that you have dedicated support for org and markdown I think we can use the specific image link syntax for org and markdown

[[/path/to/image.jpg][Alt text]]

![Alt text](/path/to/image.jpg)

Not sure what syntax to use in other context. Maybe it's good enough to support in org and markdown

https://github.com/karthink/gptel/discussions/231

Thanks

karthink commented 4 months ago

(probably useful to others, worthy to add to wiki/readme)

The above is throwaway code for demonstration purposes, so it's fragile. You can delete the separator bars and have no way to get them back, for instance. I can fix it up and maybe add it as an optional feature to gptel instead.

doctorguile commented 4 months ago

Hi @karthink

I turned on debug log and noticed something minor. Not trying to put anything on your plate but just FYI.

It seems to send properties drawers content as well. Probably the default expectation for most users is that these should be skipped. It might or might not affect LLM behavior depending on what you put there.
Sometimes it comes the org asterisk *, **** but sometimes not *** probably very minor and don't affect llm response.

will try to keep you posted if I find anything else interesting

Thanks again.

example org file (I edited the LLM output to save space here, so the bounds are off if you try to use it as is, but should be easily reproducible).

:PROPERTIES:
:header-args: :dir /Users/ :results drawer :wrap example
:GPTEL_MODEL: gpt-3.5-turbo
:GPTEL_BACKEND: Azure
:GPTEL_BOUNDS: ((237 . 278) (337 . 661) (823 . 1329) (1373 . 1955))
:END:

* level 1
hi there, what model are you?

I am a large language model called GPT-3.

*** how up to date are you with knowledge of the world?

As an AI language model, I am trained on a diverse range of internet sources...

*** can you let me know during newton issac's most productive period, what level of technology does he have in his daily life? 

During Isaac Newton's most productive period, which was in the 17th century, the level of technology was quite different from what we have today. 

**** how about pens and paper and books?

During Newton's time, pens and paper were indeed in use.

payload

{
  "messages": [
    {
      "role": "system",
      "content": "You are a large language model living in Emacs and a helpful assistant. Respond concisely."
    },
    {
      "role": "user",
      "content": ":PROPERTIES:\n:header-args: :dir /Users/ :results drawer :wrap example\n:GPTEL_MODEL: gpt-3.5-turbo\n:GPTEL_BACKEND: Azure\n:GPTEL_BOUNDS: ((223 . 264) (323 . 647) (809 . 1315))\n:END:\n\n* level 1\nhi there, what model are you?"
    },
    {
      "role": "assistant",
      "content": "I am a large language model called GPT-3."
    },
    {
      "role": "user",
      "content": "how up to date are you with knowledge of the world?"
    },
    {
      "role": "assistant",
      "content": "As an AI language model, I am trained on a diverse range of internet sources..."
    },
    {
      "role": "user",
      "content": "can you let me know during newton issac's most productive period, what level of technology does he have in his daily life?"
    },
    {
      "role": "assistant",
      "content": "During Isaac Newton's most productive period, which was in the 17th century, the level of technology was quite different from what we have today."
    },
    {
      "role": "user",
      "content": "**** how about pens and paper and books?"
    }
  ],
  "stream": true,
  "temperature": 1.0
}

karthink commented 4 months ago

Not trying to put anything on your plate but just FYI.

It seems to send properties drawers content as well. Probably the default expectation for most users is that these should be skipped. It might or might not affect LLM behavior depending on what you put there.

Yes, I am aware of this. I'm trying to keep the mental model for the user as simple as possible, so gptel really does send everything above the cursor, including Org properties, keywords, tags etc. I don't want to add any special behavior yet, although I might in the future.

Sometimes it comes the org asterisk *, * but sometimes not probably very minor and don't affect llm response.

Is this the prompt prefix that gptel inserts after the LLM response? If yes, what is your value of gptel-prompt-prefix-alist?

doctorguile commented 4 months ago

It's the default

((markdown-mode . "### ")
 (org-mode . "*** ")
 (text-mode . "### "))

So it seems * ** ****, *****, basically, 1, 2, 4+ will get sent. Just 3 *** will be stripped. Again, don't think it's a big deal for now.

For the org properties, I have some sensitive info (that don't want to get logged) and sometimes a large string (eats into context windows and confuses the conservation)

So I'll patch my own copy for now if this is a low priority in general.

Thanks

NightMachinery commented 4 months ago

@karthink There is some warning in org-use-property-inheritance about performance issues if turning it on. Its doc says one can force the inheritance when using the query functions. Perhaps you can force the inheritance for gptel? I don't use the inherited properties for anything else, so I'd like to avoid the overhead.

karthink commented 4 months ago

@karthink There is some warning in org-use-property-inheritance about performance issues if turning it on. Its doc says one can force the inheritance when using the query functions. Perhaps you can force the inheritance for gptel?

I'm not going to force the inheritance for gptel precisely because of the performance issues. This is what I was I worried about above, but it turns out it's even worse because the parse tree (as cached by org-element) does not contain this property info in most cases.

If I force inheritance here, every user who's using gptel in Org mode will be paying the performance penalty, since the GPTEL_* properties have to be searched for even if they're not aware of the feature. This is for each query, and (eventually) every time they bring up the transient menu.

I don't use the inherited properties for anything else, so I'd like to avoid the overhead.

If you are okay with the performance hit, you can selectively add the GPTEL_* properties to org-use-property-inheritance. From the documentation:

When nil, only the properties directly given in the current entry count. When t, every property is inherited. The value may also be a list of properties that should have inheritance, or a regular expression matching properties that should be inherited.

So you could run

(setopt org-use-property-inheritance "GPTEL_.*")

karthink commented 3 months ago

The feature-org branch has been merged into master. Using Org mode properties to configure the system message model, backend, max tokens and temperature is now fully implemented. These properties can be set manually, with org-set-propertyor with gptel-org-set-properties.