Using local Ollama models

glenstarchman commented 3 months ago

This is neither a feature request nor a bug but hopefully others may find it useful.

I wanted to experiment with code refactoring using local models but still using the awesome chatgpt-shell. Here is how I got it to work:

;; your ollama endpoint
(setq chatgpt-shell-api-url-base "http://127.0.0.1:11434")

;; models you have pulled for use with ollama
(setq chatgpt-shell-model-versions
      '("gemma:2b-instruct"
        "zephry:latest"
        "codellama:instruct"
        "magicoder:7b-s-cl-q4_0"
        "starcoder:latest"
        "deepseek-coder:1.3b-instruct-q5_1"
        "qwen:1.8b"
        "mistral:7b-instruct"
        "orca-mini:7b"
        "orca-mini:3b"
        "openchat:7b-v3.5-q4_0"))

;; override how chatgpt-shell determines the context length
;; NOTE: use this as a template and adjust as needed
(defun chatgpt-shell--approximate-context-length (model messages)
  "Approximate the context length using MODEL and MESSAGES."
  (let* ((tokens-per-message)
         (max-tokens)
         (original-length (floor (/ (length messages) 2)))
         (context-length original-length))
    ;; Remove "ft:" from fine-tuned models and recognize as usual
    (setq model (string-remove-prefix "ft:" model))
    (cond
     ((string-prefix-p "starcoder" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-3-5
            max-tokens 4096))
     ((string-prefix-p "magicoder" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-3-5
            max-tokens 4096))
     ((string-prefix-p "gemma" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "openchat" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "codellama" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "zephyr" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "qwen" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "deepseek-coder" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "mistral" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "orca" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     (t
      (error "Don't know '%s', so can't approximate context length" model)))
    (while (> (chatgpt-shell--num-tokens-from-messages
               tokens-per-message messages)
              max-tokens)
      (setq messages (cdr messages)))
    (setq context-length (floor (/ (length messages) 2)))
    (unless (eq original-length context-length)
      (message "Warning: chatgpt-shell context clipped"))
    context-length))

I have found that the gemma models integrate the best with correct code formatting, etc, but your mileage may vary.

The majority of chatgpt-shell features work and you can even change models with C-c C-v.

xenodium commented 2 months ago

Thanks for this Glen! This is impresive and great to see. I'd been meaning to create a higher-level abstraction that reuses more chatgpt-shell things maybe on top of shell-maker https://xenodium.com/a-shell-maker.

I've not had a chance to play with these models. I'm guessing they're also implementing OpenAI's API/schema, which would make reusing more things easier for chatgpt-shell.

LemonBreezes commented 1 day ago

Okay. I just got this package working with open-webui, which I really like as a wrapper for Ollama.

First thing I had to do was go to the settings for the current logged in user by clicking the top-right user bubble. Then click Settings > Account > API keys and set that key as your chatgpt-shell-openai-key. Then adapt this code to fit your Open WebUI instance:

(after! chatgpt-shell
  ;; your ollama endpoint
  (setq chatgpt-shell-api-url-base "http://wydrogen:3000"
        chatgpt-shell-api-url-path "/ollama/api/chat")

  ;; models you have pulled for use with ollama
  (setq chatgpt-shell-model-versions
        '("dolphin-mixtral:latest"
          "llama3:latest"
          "llava:13b"
          "gemma2:27b"
          "deepseek-coder-v2:latest"))

  (defvar chatgpt-shell-model-settings
    (list (cons "llama3:latest" '((max-tokens . 8192)))
          (cons "llava:13b" '((max-tokens . 8192)))
          (cons "gemma2:27b" '((max-tokens . 8192)))
          (cons "dolphin-mixtral:latest" '((max-tokens . 8192)))
          (cons "deepseek-coder-v2:latest" '((max-tokens . 8192)))))

  ;; Adapt the above function to our `chatgpt-shell-model-settings'
  (defun chatgpt-shell--approximate-context-length (model messages)
    "Approximate the context length using MODEL and MESSAGES."
    (let* ((tokens-per-message 4)
           (max-tokens)
           (original-length (floor (/ (length messages) 2)))
           (context-length original-length))
      (let ((settings (alist-get model chatgpt-shell-model-settings)))
        (setq max-tokens (alist-get 'max-tokens settings 4096)))
      (while (> (chatgpt-shell--num-tokens-from-messages
                 tokens-per-message messages)
                max-tokens)
        (setq messages (cdr messages)))
      (setq context-length (floor (/ (length messages) 2)))
      (unless (eq original-length context-length)
        (message "Warning: chatgpt-shell context clipped"))
      context-length))

  (defun chatgpt-shell--extract-chatgpt-response (json)
    "Extract ChatGPT response from JSON."
    (if (eq (type-of json) 'cons)
        (let-alist json ;; already parsed
          (or (or .delta.content
                  .message.content)
              .error.message
              ""))
      (if-let (parsed (shell-maker--json-parse-string json))
          (string-trim
           (let-alist parsed
             .message.content))
        (if-let (parsed-error (shell-maker--json-parse-string-filtering
                               json "^curl:.*\n?"))
            (let-alist parsed-error
              .error.message))))))

LemonBreezes commented 1 day ago

I also like using this to remove the "ChatGPT" branding from the prompt:

(defun chatgpt-shell--prompt-pair ()
  "Return a pair with prompt and prompt-regexp."
  (cons
   (format "Ollama(%s)> " (chatgpt-shell--shell-info))
   (rx (seq bol "Ollama" (one-or-more (not (any "\n"))) ">" (or space "\n")))))

(eval '(setf (shell-maker-config-prompt chatgpt-shell--config)
             (car (chatgpt-shell--prompt-pair))))
(eval '(setf (shell-maker-config-prompt-regexp chatgpt-shell--config)
             (cdr (chatgpt-shell--prompt-pair))))

xenodium commented 1 day ago

This is really cool @LemonBreezes! Nice work.

I'm guessing since the LLM APIs are the same, most chatgpt-shell features work? Like chatgpt-shell-swap-system-prompt, chatgpt-shell-swap-model-version, and chatgpt-shell-prompt-compose?

xenodium / chatgpt-shell

Using local Ollama models #201