Open lrq3000 opened 8 months ago
Can you use wireshark on loopback device to watch communication and can show what happenen on communication beetween client and gpt4all?
May this code examples help you:
const OpenAI = require ('openai');
const openai = new OpenAI({
baseURL:'http://127.0.0.1:4891/v1',
apiKey: "not needed for a local LLM",
});
async function main() {
const text = await openai.chat.completions.create(
{
messages: [{ role: 'user', content: '}],
model: 'Nous Hermes 2 Mistral DPO',
max_tokens: 1024 ,
n: 1,
stop: null,
temperature: 0.35,
top_p:0.75,
stream:false,
},
{ maxRetries: 5,} )
console.log( text );
console.log( text.choices);
}//main
main();
const json_completion = JSON.stringify(
{stream:false,
temperatur:0.6,
max_tokens:100,
messages:[{role: "user", content:"Hello"},],
model: 'Nous Hermes 2 Mistral DPO'
}
);
const completions = await fetch("http://127.0.0.1:4891/v1/chat/completions",{
keepalive: true,
method: "POST",
mode: "no-cors",
// with this mode it will get a response ,
// but for security reason js in browser can not access the result of "await completions.json()"
headers: {
Accept: 'application/json',
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': "*",
'Access-Control-Allow-Headers': "*"
},
body:json_completion
});
const completionjson = await completions.json();
/* here this is a problem,
* cause the browser can not do mode:"no-corse" and
* after finish the request do "completions.json()"
* this results in an error like
* Uncaught (in promise) SyntaxError: JSON.parse: unexpected end of data at line 1 column 1 of the JSON data
* see https://stackoverflow.com/questions/54896998/how-to-process-fetch-response-from-an-opaque-type
* without "mode:"no-corse" you will get an error like
* XHROPTIONS http://127.0.0.1:4891/v1/chat/completions CORS Preflight Did Not Succeed
* /
console.log(completionjson);
Thank you @zwilch . I am quite rusty with wireshark, so I'm going to need some time to debug it adequately this way.
Nevertheless, I tried to use curl
, an alternative to your two other suggested solutions. And I think this already sheds some light on the issue.
Here is what GPT4All spits out:
$ curl http://localhost:5001/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-coder-6.7b-instruct.Q8_0.gguf",
"messages": [{"role": "user", "content": "Hello! What is your name?"}]
}'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 489 100 354 100 135 66 25 0:00:05 0:00:05 --:--:-- 85{"choices":[{"finish_reason":"length","index":0,"message":{"content":"It seems like you forgot to say anything, could you please tell me again how","role":"assistant"}}],"created":1712489905,"id":"foobarbaz","model":"deepseek-coder-6.7b-instruct.Q8_0.gguf","object":"text_completion","usage":{"completion_tokens":16,"prompt_tokens":20,"total_tokens":36}}
I wrote before that it worked with curl. It indeed appears to do so, but it's only an appearance: when looking at the exact output, it is very much subpar in the quality we could expect, often outputting gibberish sentences and ending mid-sentence.
For comparison, here is what GPT4All outputs when the same model is queried from the GUI:
As an artificial intelligence, I don't have personal experiences or emotions like human beings do. Therefore, I am not named after individuals but rather by the programmers who designed me. My purpose is to assist users in providing information and answering questions based on my programming knowledge base. How can I help you today?
And here is what ollama outputs with the same model and prompt:
$ curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-coder:6.7b-instruct-Q8_0",
"messages": [{"role": "user", "content": "Hello! What is your name?"}]
}'
{"id":"chatcmpl-721","object":"chat.completion","created":1712479256,"model":"deepseek-coder:latest","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"As an AI Programming Assistant based on DeepSeek's model \"Deepseek Coder\", I don’t have a personal identity so it can be any person who has access to my features or services, such as the ability to respond in many languages. My design is focused around providing help and information related to computer science topics within this context of AI programming assistant service. How may I assist you with your coding needs today?\n"},"finish_reason":"stop"}],"usage":{"prompt_tokens":76,"completion_tokens":91,"total_tokens":167}}
So it seems that it's not just a formatting issue, but the GPT4All OpenAI-like API server does not work respond to queries the same way. It seems that it forgets the default parameters maybe? Because it outputs total gibberish, often stopping mid-sentence.
So this issue is not only related to continuedev it seems, it's the whole OpenAI-like API server function that seems to be affected.
I am trying to test my hypothesis above that it's because of missing parameters, but for the moment when I try to input the parameters it takes an infinite time to generate.
Sorry, I haven't read through everything here, but it might be a templates/parameters issue, so:
Note that many models don't work all that well if you don't provide them with the expected templates. I don't think these are added automatically to any of the web API endpoints. Also, the parameters can have a big influence, too.
What you should try:
@cosmic-snow Thank you for your suggestions, and although I will implement them in future tests to improve replicability, this is not a templating/parameters issue, as the model works very fine in GPT4All, and furthermore the issue inside Continue's chat is that it does not output anything, whatever the prompt.
(PS: I know how to edit continue config file, I made it work with several models in koboldcpp including the same model I am trying to use in gpt4all -- koboldcpp is also not supported by default in continue and must be manually configured as an OpenAI-like API server)
... not a templating/parameters issue, as the model works very fine in GPT4All, and furthermore the issue inside Continue's chat is that it does not output anything, whatever the prompt.
Alright then, but are you sure? I'm not all that familiar with the GUI's API server, but I've spent a bit of time with that recently. It's certainly possible that it's not entirely compatible and something that's expected by the continue plugin is not actually returned by the server.
That is, it definitely doesn't mimic the OpenAI API in full.
However, looking at the output of your previous comment again: GPT4All response excerpt:
... "usage":{"completion_tokens":16,"prompt_tokens":20,"total_tokens":36}
ollama response excerpt:
... "usage":{"prompt_tokens":76,"completion_tokens":91,"total_tokens":167}
Note how many more prompt_tokens
it says it has used for the ollama prompt, although your own input is the same in both cases. My hunch here is that ollama adds templates, whereas in GPT4All you'd have to do that manually.
It's entirely possible that this isn't the only issue, though. To get everything to work, I mean. You might also want to run curl -v
once in case there's a problem with the HTTP headers (or use a web API tool which shows more details).
I'll probably have a look at the continue plugin when I have some time.
I see, I missed this detail. I'll try to debug this further, but this is getting a bit out of my current abilities, I need to train but I'm not sure when I'll have time to do that... But at least your indications are pointing me to the right direction, I'll post further comments if I find how to do that.
(NB: I wanted to use HTTP Toolkit but it didn't work, then I tried Wireshark but for some reason I cannot see the exchange, I must be mismanipulating, so what remains is Frida.re -- I think it would be more effective if I could catch and manipulate all the exchanges)
I tested a few different backends and I think that the issue is that server doesn't support streaming responses and continuedev extension require those.
Every backend that worked returned streaming response.
There is also a parameter stream: true
in incoming data:
{"messages":[{"role":"user","content":"hello"}],"model":"Llama 3 Instruct","max_tokens":1024,"stream":true}
it should do streaming: https://docs.gpt4all.io/gpt4all_python.html#chatting-with-gpt4all
The GPT4ALL v3.0.0 client has a "Server Chat" section which correctly shows the response to queries received from VSCode/Continue as they arrive, but I can confirm that when configured as the OP suggests at least, these responses don't make it back into Continue.
will there be any fix for this?
Sorry, last time I tried to really look into it I got held up, so I shelved it for a while.
I tested a few different backends and I think that the issue is that server doesn't support streaming responses and continuedev extension require those.
Every backend that worked returned streaming response.
True, the server mode currently doesn't implement streaming responses. If that's a hard requirement, then I guess this is the problem here.
will there be any fix for this?
I can't really say what the plans are right now, sorry. Improvements to the server mode are mentioned on the roadmap, however.
I got it working with a stopgap solution, https://github.com/continuedev/continue/pull/2097. I'll see if I can make changes to gpt4all to support SSE.
I've added support for SSE response in this PR, https://github.com/nomic-ai/gpt4all/pull/2910, and tested with prod continue.dev version, it seems to be working.
Awesome @raja-jamwal , thank you so much! I hope this will get merged soon! GPT4All is so much more efficient than other LLM runners such as ollama, I literally cannot run the best models my computer can with other runners.
Bug Report
I tried to use GPT4All as a local LLM server with an OpenAI-like API for serving as a code copilot via the continue plugin for VSCode.
Unfortunately, whatever I tried, it did not work.
The server is correctly detected and all models are correctly loaded, using continue prerelease. However, when trying to send any message to gpt4all from continue, the response seems to be empty.
However, when I do my own curl query it works, so I don't know how to debug this further.
I have tried with Ollama and Koboldcpp (via the OpenAI-like API, same settings as for GPT4All - of course I changed the ports), and it worked for both flawlessly.
This seems to me to be an incompatibility in the API. Continue is expecting something that GPT4All is not providing or not in the expected format.
Steps to Reproduce
Expected Behavior
Continue should get non-empty responses from GPT4All.
Your Environment