Contact to llamafile-AI on server fails

nameiwillforget commented 3 months ago

I have two AIs set up, one on my laptop and one on my desktop:

(use-package gptel
  :config
  (gptel-make-openai "testai"          ;Any name
    :stream t                             ;Stream responses
    :protocol "http"
    :host "localhost:8080"                ;Llama.cpp server location
    :models '("test")
    :key nil)

  (gptel-make-openai "desktop"          ;Any name
    :stream t                             ;Stream responses
    :protocol "http"
    :host "1.0.0.8:8080"                ;Llama.cpp server location
    :models '("test")
    :key nil)

  ;; (setq-default
  ;;  gptel-model   "test"
  ;;  gptel-backend (gptel-make-openai "testai"
  ;;                  :stream t
  ;;                  :protocol "http"
  ;;                  :host "localhost:8080"
  ;;                  :models '("test")))

  (setq-default
   gptel-model   "test"
   gptel-backend (gptel-make-openai "desktop"
                   :stream t
                   :protocol "http"
                   :host "10.0.0.8:8080"
                   :models '("test"))))

If I use the local machine with the defaults that are here commented out, it works. If I try to use the desktop-AI, it yields the following error:

desktop response error: ((c4bb9327bb265bd639a950ab5ffe93f8 . 0)) Could not parse HTTP response.

The following shell-script, which copies a file and feeds it to the desktop AI works though:

#!/bin/bash
scp $1 alex@10.0.0.8:/home/alex/wizard
ssh alex@10.0.0.8 "sh ~/.local/bin/wizardcoder-python-34b-v1.0.Q5_K_M.llamafile /home/alex/wizard/$1"

What's the problem?

karthink commented 3 months ago

I'm assuming you're using the server llamafile in your desktop and not the other one.

Try looking at the request log:

Run (setq gptel-log-level 'debug)
Try to use the desktop llamafile and produce the error
Look at the *gptel-log* buffer. The curl command the HTTP response should be present. You can paste that here.

nameiwillforget commented 3 months ago

Here is the log:

{
  "gptel": "request headers",
  "timestamp": "2024-03-08 00:42:48"
}
{
  "Content-Type": "application/json"
}
{
  "gptel": "request body",
  "timestamp": "2024-03-08 00:42:48"
}
{
  "model": "test",
  "messages": [
    {
      "role": "system",
      "content": "You are a large language model living in Emacs and a helpful assistant. Respond concisely."
    },
    {
      "role": "user",
      "content": "Can you hear me?"
    }
  ],
  "stream": false,
  "temperature": 1.0
}
{
  "gptel": "request Curl command",
  "timestamp": "2024-03-08 00:42:48"
}
[
  "curl",
  "--disable",
  "--location",
  "--silent",
  "--compressed",
  "-XPOST",
  "-y300",
  "-Y1",
  "-D-",
  "-w(75aecd991c05b7de7a6e566cc05016ad . %{size_header})",
  "-d{\"model\":\"test\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a large language model living in Emacs and a helpful assistant. Respond concisely.\"},{\"role\":\"user\",\"content\":\"Can you hear me?\"}],\"steam\":false,\"temperature\":1.0}",
  "-HContent-Type: application/json",
  "http://localhost:8080/v1/chat/completions"

karthink commented 3 months ago

@nameiwillforget this looks incomplete, did you grab everything in the log buffer?

nameiwillforget commented 3 months ago

Yes, but there was another gptel-buffer, gptel-curl:

HTTP/1.1 200 OK
Access-Control-Allow-Origin: 
Content-Type: text/event-stream
Keep-Alive: timeout=5, max=5
Server: llama.cpp
Transfer-Encoding: chunked

(d062d386c408445be36c4ba19bd78419 . 160)

karthink commented 3 months ago

[
  "curl",
  "--disable",
  "--location",
  "--silent",
  "--compressed",
  "-XPOST",
  "-y300",
  "-Y1",
  "-D-",
  "-w(75aecd991c05b7de7a6e566cc05016ad . %{size_header})",
  "-d{\"model\":\"test\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a large language model living in Emacs and a helpful assistant. Respond concisely.\"},{\"role\":\"user\",\"content\":\"Can you hear me?\"}],\"steam\":false,\"temperature\":1.0}",
  "-HContent-Type: application/json",
  "http://localhost:8080/v1/chat/completions"

I meant in the gptel-log buffer. It looks like the above log is incomplete. Could you try again?

nameiwillforget commented 3 months ago

I just tried, but now it simply worked. I don't know what changed, I tried several different times before. I changed the how llamafile-files are executed by default using mimeo. Could that have something to do with it? Though the llm was running before I changed that, I think, so I'm not sure how it would. Anyway, I'll look and try to find out what changed.

nameiwillforget commented 3 months ago

So it seems like it's only the contact between the laptop and the desktop that doesn't work, if I do exactly same thing from the desktop itself, it works. I tried again and I think the resulting log is the same, but here it is nevertheless:

{
  "gptel": "request headers",
  "timestamp": "2024-03-11 21:48:18"
}
{
  "Content-Type": "application/json"
}
{
  "gptel": "request body",
  "timestamp": "2024-03-11 21:48:18"
}
{
  "model": "test",
  "messages": [
    {
      "role": "system",
      "content": "You are a large language model living in Emacs and a helpful assistant. Respond concisely."
    },
    {
      "role": "user",
      "content": "Can you hear me?"
    }
  ],
  "stream": false,
  "temperature": 1.0
}
{
  "gptel": "request Curl command",
  "timestamp": "2024-03-11 21:48:18"
}
[
  "curl",
  "--disable",
  "--location",
  "--silent",
  "--compressed",
  "-XPOST",
  "-y300",
  "-Y1",
  "-D-",
  "-w(8866ba70f2e8a5a85ab4dc25c869e5a1 . %{size_header})",
  "-d{\"model\":\"test\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a large language model living in Emacs and a helpful assistant. Respond concisely.\"},{\"role\":\"user\",\"content\":\"Can you hear me?\"}],\"stream\":false,\"temperature\":1.0}",
  "-HContent-Type: application/json",
  "http://10.0.0.8:8080/v1/chat/completions"
]

karthink commented 3 months ago

What happens if you run that curl command manually?


curl --disable --location --silent --compressed -XPOST -y300 -Y1 -D- \
     -w'(8866ba70f2e8a5a85ab4dc25c869e5a1 . %{size_header})' \
     -d"{\"model\":\"test\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a large language model living in Emacs and a helpful assistant. Respond concisely.\"},{\"role\":\"user\",\"content\":\"Can you hear me?\"}],\"stream\":false,\"temperature\":1.0}" -H"Content-Type: application/json" \
     'http://10.0.0.8:8080/v1/chat/completions'

nameiwillforget commented 3 months ago

I get the following output:

(8866ba70f2e8a5a85ab4dc25c869e5a1 . 0)

I successfully contacted the model from the desktop immediately before that.

karthink commented 3 months ago

I get the following output:

This is a networking/connection issue, unrelated to gptel. I suggest checking if you can ping your desktop/laptop from the other device first.

karthink / gptel

Contact to llamafile-AI on server fails #232