karthink / gptel

A simple LLM client for Emacs
GNU General Public License v3.0
1.04k stars 113 forks source link

In-buffer refactored code formatting #171

Closed daedsidog closed 6 months ago

daedsidog commented 6 months ago

Currently when you use in-buffer code refactoring, more often than not it's not satisfactory. That is to say, the code that he returns is usually good depending on your prompt or model, but the actual formatting of the code, usually, is bad, because of mismatching indentation & the inclusion of Markdown formatting in the returned code.

I fixed this for myself by having the following in my config:

(cl-defun my/clean-up-gptel-refactored-code ()
  "Clean up the code responses for refactored code in the current buffer.
Current buffer is guaranteed to be the response buffer."
  (when gptel-mode ; Don't want this to happen in the dedicated buffer.
    (cl-return-from my/clean-up-gptel-refactored-code))
  (save-excursion
    (let* ((res-beg (point-min))
           (res-end nil)
           (contents nil))
      (unless (get-text-property res-beg 'gptel)
        (setq res-beg (next-single-property-change res-beg 'gptel)))
      (while res-beg
        (setq res-end (next-single-property-change res-beg 'gptel))
        (unless res-end
          (setq res-end (point-max)))
        (setq contents (buffer-substring-no-properties res-beg
                                                       res-end))
        (setq contents (replace-regexp-in-string "\n*``.*\n*"
                                                 ""
                                                 contents))
        (delete-region res-beg res-end)
        (goto-char res-beg)
        (insert contents)
        (setq res-end (point))
        ;; Indent the code to match the buffer indentation if it's messed up.
        (indent-region res-beg res-end)
        (pulse-momentary-highlight-region res-beg res-end)
        (setq res-beg (next-single-property-change res-beg 'gptel))))))

Then, in gptel config:

(add-hook 'gptel-post-response-hook #'my/clean-up-gptel-refactored-code)

What this does is remove the Markdown tags and automatically indent the code with respect to the buffer. When you send the prompt, you can see the streamed-in bad code, and after the streaming is complete, it refactors it.

I opened it as an issue instead of a pull request because the code above is more of a quick & dirty hack instead of a fundamental change to the plugin, and wasn't sure if this was a desired feature to begin with. It seems to me that perhaps a better approach would be to have the prompt to automatically include something that tells him to respect formatting and not use Markdown, but even then he might fail.

karthink commented 6 months ago

@daedsidog This is neat, thanks. I think this should be on the wiki and not a built-in gptel option. The problem is that LLM output varies greatly (both within an LLM and across LLMs), so code that tries to nail down inconsistencies like this will be fragile. This is already the case with the rudimentary markdown->org converter gptel uses, I'd like to avoid adding more code that tries to work around formatting issues.

The best way to address this is through the system prompt. With the OpenAI models at least, I've had good results with specifying that only code should be included in the response: "Provide code and only code as output without any additional text, prompt or note."

I can see this being quite useful for some LLMs that don't take directions well, like the Mistral models. Could you add it to the wiki?

daedsidog commented 6 months ago

I tried, but I'm not entirely sure how. I cloned the wiki and tried to push my own branch, but it says I don't have permission. I don't know how to fork a wiki, either.

The below needs to be appended to Home.md:

## Formatting in-buffer refactored code

Depending on your model or prompt, sometimes returned in-buffer refactored code would not be perfect, i.e. it would have incorrect indentation or even include Markdown formatting. The below is an example for how to add a hook to automatically fix such cases.

```emacs-lisp
(cl-defun my/clean-up-gptel-refactored-code ()
  "Clean up the code responses for refactored code in the current buffer.
Current buffer is guaranteed to be the response buffer."
  (when gptel-mode ; Don't want this to happen in the dedicated buffer.
    (cl-return-from my/clean-up-gptel-refactored-code))
  (save-excursion
    (let* ((res-beg (point-min))
           (res-end nil)
           (contents nil))
      (unless (get-text-property res-beg 'gptel)
        (setq res-beg (next-single-property-change res-beg 'gptel)))
      (while res-beg
        (setq res-end (next-single-property-change res-beg 'gptel))
        (unless res-end
          (setq res-end (point-max)))
        (setq contents (buffer-substring-no-properties res-beg
                                                       res-end))
        (setq contents (replace-regexp-in-string "\n*``.*\n*"
                                                 ""
                                                 contents))
        (delete-region res-beg res-end)
        (goto-char res-beg)
        (insert contents)
        (setq res-end (point))
        ;; Indent the code to match the buffer indentation if it's messed up.
        (indent-region res-beg res-end)
        (pulse-momentary-highlight-region res-beg res-end)
        (setq res-beg (next-single-property-change res-beg 'gptel))))))

Then, where you config gptel, add:

(add-hook 'gptel-post-response-hook #'my/clean-up-gptel-refactored-code)

What this does is remove the Markdown tags and automatically indent the code with respect to the buffer. When you send the prompt, you can see the streamed-in bad code, and after the streaming is complete, it refactors it.

karthink commented 6 months ago

My apologies, the Wiki was locked to "collaborators only". I didn't realize it wasn't publicly editable. You can go ahead now.

karthink commented 5 months ago

@daedsidog heads-up about a breaking change: I renamed gptel-post-response-hook to gptel-post-response-functions and changed how it behaves. It now calls hook functions with the start and end buffer positions of the response, which makes it very easy to do the kind of transformations like in this thread. Here's the new version of your my/clean-up-gptel-refactored-code, please update your config.

(cl-defun my/clean-up-gptel-refactored-code (beg end)
  "Clean up the code responses for refactored code in the current buffer.

The response is placed between BEG and END.  The current buffer is
guaranteed to be the response buffer."
  (when gptel-mode          ; Don't want this to happen in the dedicated buffer.
    (cl-return-from my/clean-up-gptel-refactored-code))
  (when (and beg end)
    (save-excursion
      (let ((contents
             (replace-regexp-in-string
              "\n*``.*\n*" ""
              (buffer-substring-no-properties beg end))))
        (delete-region beg end)
        (goto-char beg)
        (insert contents))
      ;; Indent the code to match the buffer indentation if it's messed up.
      (indent-region beg end)
      (pulse-momentary-highlight-region beg end))))

I've updated the wiki as well.