tmc / langchaingo

LangChain for Go, the easiest way to write LLM-based programs in Go
https://tmc.github.io/langchaingo/
MIT License
4.23k stars 589 forks source link

500 when hitting llama.cpp's OpenAI compatible API at localhost:8000/v1 #667

Open NinjaPerson24119 opened 6 months ago

NinjaPerson24119 commented 6 months ago

As described, doesn't seem to work with llama.cpp. Not sure what the problem is. Hits OpenAI fine, and a different program can hit my Llama.cpp no problem.

tmc commented 6 months ago

Could you run with more debugging information such as what is displayed in https://github.com/tmc/langchaingo/tree/main/examples/openai-completion-example-with-http-debugging to get more information about the 500?

NinjaPerson24119 commented 6 months ago
2024/03/13 18:09:00 Request:
POST /v1/chat/completions HTTP/1.1
Host: localhost:8000
User-Agent: Go-http-client/1.1
Content-Length: 147
Authorization: Bearer <token>
Content-Type: application/json
Accept-Encoding: gzip

{"model":"dummy","messages":[{"role":"user","content":[{"text":"How many primary colors are there?","type":"text"}]}],"temperature":0}
2024/03/13 18:09:00 Response:
HTTP/1.1 500 Internal Server Error
Content-Length: 91
Access-Control-Allow-Origin: 
Content-Type: text/plain; charset=utf-8
Keep-Alive: timeout=5, max=5
Server: llama.cpp

500 Internal Server Error
[json.exception.type_error.302] type must be string, but is array
Error: error generating completion: API returned unexpected status code: 500

I noticed that the shape totally differs from other "OpenAI" compatible requests.

Here's the output from a proxy I wrote to log the requests:

Langchaingo (gives error)

Received completion request {'model': 'dummy', 'messages': [{'role': 'user', 'content': [{'text': 'How many primary colors are there?', 'type': 'text'}]}], 'temperature': 0}

Continue extension for VSCode (this one works fine)

Received completion request {'messages': [{'role': 'user', 'content': 'How many primary colors are there?'}], 'model': 'deepseek-33b', 'max_tokens': 1024, 'stream': True, 'temperature': 0.1, 'repeat_penalty': 1}
127.0.0.1 - - [13/Mar/2024 18:15:01] "POST /v1/chat/completions HTTP/1.1" 200 -

The key difference is whether 'content' is an array or a string

devalexandre commented 6 months ago

@NinjaPerson24119 llamacpp's payload is different from that described in openai, hence the error. after the PR of llamafile this are in my list :)

image

devalexandre commented 6 months ago

@NinjaPerson24119 ou can use llamafile for use llama.cpp server now in v1.6.0 :)

devalexandre commented 6 months ago

@tmc could close this?