karthink / gptel

A simple LLM client for Emacs
GNU General Public License v3.0
1.28k stars 128 forks source link

Slow responses from GPT4All cause error parsing HTTP response #125

Closed nickanderson closed 11 months ago

nickanderson commented 11 months ago

I am using GPT4All Desktop with Vulkan on NVIDIA RTX A2000 8GB Laptop GPU.

When I send a query with gptel-send I most often get a message like this in my *Messages* buffer.

ChatGPT response error: (((a5fc6c7e12b4c77eccc028d8aa342ca1 . 0)) Could not parse HTTP response.) Could not parse HTTP response.

I thought that errors I was seeing in GPT4All Desktop console logs indicated the query failed to be processed:

Error allocating memory ErrorOutOfDeviceMemory
error loading model: Error allocating vulkan memory.
llama_load_model_from_file: failed to load model
LLAMA ERROR: failed to load model from /home/nickanderson/.cache/gpt4all/mistral-7b-instruct-v0.1.Q4_0.gguf

However I since found the Server Chat in the UI and can see the queries sent from gptel and the responses that should come back.

image

Prompt: You are a Nexus, Nick Anderson's helpful assistant. You have dry humor. You always respond using org-mode syntax.

Complete this sentence: "I've got a lovely bunch of coconuts, " 
Response: and I'm ready to crack them open! Now if only someone could teach me to make piña colada without a C64 emulator...
Prompt: You are a Nexus, Nick Anderson's helpful assistant. You have dry humor. You always respond using org-mode syntax.

Complete this sentence: "I've got a lovely bunch of coconuts, " 
Response: "(inserting dry humor) I bet they all hate themselves."

I get responses back in emacs only when the response if fairly quick. With more context the response takes longer (especially if it offloads to CPU) and the response doesn't come back to emacs.

Digging around a bit I found the option for curls max-time here: https://github.com/karthink/gptel/blob/d7a89d7575e5d0f5148d7498536e470fda628f1e/gptel-curl.el#L58

I tweaked that up to 300 evaluated, tried a longer context message and it came back to emacs.

I think it would be nice if there were a variable that we could tweak for the timeout. Also it would be nice if we could set the default number of messages to send, currently it seems I need to set this via gptel-menu in each buffer I am using.

karthink commented 11 months ago

Thank you for investigating the cause of the error, that's very helpful.

Also it would be nice if we could set the default number of messages to send.

You can set an internal variable to control this, but it might break at some point in the future.

(setq-default gptel--num-messages-to-send 4)

I think it would be nice if there were a variable that we could tweak for the timeout.

Isn't it sufficient to raise the default timeout to 300 or 600s?

nickanderson commented 11 months ago

Thank you for investigating the cause of the error, that's very helpful.

You are welcome, and thank you for the package and maintenance!

(setq-default gptel--num-messages-to-send 4)

OK.

Isn't it sufficient to raise the default timeout to 300 or 600s?

It is for me. So, here is a PR: https://github.com/karthink/gptel/pull/127