mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
https://localai.io
MIT License
25.78k stars 1.93k forks source link

curl: (52) Empty reply from server #320

Open ralyodio opened 1 year ago

ralyodio commented 1 year ago
$ curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin",            
     "prompt": "A long time ago in a galaxy far, far away",
     "temperature": 0.7
   }'
curl: (52) Empty reply from server

running this command crashes the docker container.

ralyodio commented 1 year ago

i believe this model isn't supported, but ideally the server wouldnj't crash.

Aisuko commented 1 year ago

Please show us more detail. for example, the container's log

ralyodio commented 1 year ago
localai-api-1  | 1:02PM DBG [llama] Attempting to load
localai-api-1  | 1:02PM DBG Loading model llama from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:02PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | llama.cpp: loading model from /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | error loading model: llama.cpp: tensor '���x�7�h' should not be 1004879872-dimensional
localai-api-1  | llama_init_from_file: failed to load model
localai-api-1  | 1:03PM DBG [llama] Fails: failed loading model
localai-api-1  | 1:03PM DBG [gpt4all-llama] Attempting to load
localai-api-1  | 1:03PM DBG Loading model gpt4all-llama from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | llama.cpp: loading model from /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file?
localai-api-1  | gptjllama_init_from_file: failed to load model
localai-api-1  | LLAMA ERROR: failed to load model from /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | gpt2_model_load: loading model from '/models/gml-gpt4all-j.bin'
localai-api-1  | stablelm_model_load: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1  | dollyv2_model_load: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1  | redpajama_model_load: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1  | replit_model_load: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1  | gpt_neox_model_load: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1  | bert_load_from_file: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1  | starcoder_model_load: loading model from '/models/gml-gpt4all-j.bin'
localai-api-1  | 1:03PM DBG [gpt4all-llama] Fails: failed loading model
localai-api-1  | 1:03PM DBG [gpt4all-mpt] Attempting to load
localai-api-1  | 1:03PM DBG Loading model gpt4all-mpt from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | mpt_model_load: loading model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' - please wait ...
localai-api-1  | mpt_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1  | GPT-J ERROR: failed to load model from /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin1:03PM DBG [gpt4all-mpt] Fails: failed loading model
localai-api-1  | 1:03PM DBG [gpt4all-j] Attempting to load
localai-api-1  | 1:03PM DBG Loading model gpt4all-j from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | gptj_model_load: loading model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' - please wait ...
localai-api-1  | gptj_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1  | GPT-J ERROR: failed to load model from /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin1:03PM DBG [gpt4all-j] Fails: failed loading model
localai-api-1  | 1:03PM DBG [gpt2] Attempting to load
localai-api-1  | 1:03PM DBG Loading model gpt2 from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | gpt2_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1  | gpt2_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1  | 1:03PM DBG [gpt2] Fails: failed loading model
localai-api-1  | 1:03PM DBG [stablelm] Attempting to load
localai-api-1  | 1:03PM DBG Loading model stablelm from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | stablelm_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1  | stablelm_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1  | 1:03PM DBG [stablelm] Fails: failed loading model
localai-api-1  | 1:03PM DBG [dolly] Attempting to load
localai-api-1  | 1:03PM DBG Loading model dolly from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | dollyv2_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1  | dolly_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1  | 1:03PM DBG [dolly] Fails: failed loading model
localai-api-1  | 1:03PM DBG [redpajama] Attempting to load
localai-api-1  | 1:03PM DBG Loading model redpajama from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | redpajama_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1  | redpajama_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1  | 1:03PM DBG [redpajama] Fails: failed loading model
localai-api-1  | 1:03PM DBG [replit] Attempting to load
localai-api-1  | 1:03PM DBG Loading model replit from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | replit_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1  | replit_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1  | 1:03PM DBG [replit] Fails: failed loading model
localai-api-1  | 1:03PM DBG [gptneox] Attempting to load
localai-api-1  | 1:03PM DBG Loading model gptneox from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | gpt_neox_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1  | gpt_neox_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1  | 1:03PM DBG [gptneox] Fails: failed loading model
localai-api-1  | 1:03PM DBG [bert-embeddings] Attempting to load
localai-api-1  | 1:03PM DBG Loading model bert-embeddings from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | bert_load_from_file: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1  | bert_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1  | 1:03PM DBG [bert-embeddings] Fails: failed loading model
localai-api-1  | 1:03PM DBG [starcoder] Attempting to load
localai-api-1  | 1:03PM DBG Loading model starcoder from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1  | starcoder_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1  | starcoder_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1  | 1:03PM DBG [starcoder] Fails: failed loading model
localai-api-1  | [172.18.0.1]:42438  500  -  POST     /v1/completions
ralyodio commented 1 year ago

all subsequent requests hang even when changing to a known good model.

Aisuko commented 1 year ago

Thanks for your feedback, maybe you can check it later, it is a loop to load the models. And if you want to have a quick start, I suggest you use models here: https://github.com/go-skynet/model-gallery and we will investigate this issue.

jaskaran-online commented 1 year ago

I have same issue

elven2016 commented 1 year ago

the same issue too, My models are these: ggml-gpt4all-j.bin,vicuna-13b-v1.1 , the api container run up sucecess ,but api seems not work : curl http://localhost:8080/v1/models curl: (52) Empty reply from server

madorb commented 1 year ago

i got this same result following exactly the steps in the tutorial in the readme

zcong1993 commented 1 year ago

I got the same issue when using ggml-gpt4all-j model, and I solve it later by increase the memory limit of docker. I think it might be a OOM error when loading models with low container memory limit.

arch-user-france1 commented 1 year ago

I just tried the example (wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j) and I could not even call /v1/models, it would just say Empty reply from server.

I'm on an ARM device, the M2 precisely.

arch-user-france1 commented 1 year ago

After manually building it, everything seems to work just fine...

caleboleary commented 1 year ago

In case this helps anyone else, I received this response to the get models curl in the quick start, but about 10 minutes later my model appeared when I asked it again, so perhaps they take some time to load into RAM or something.

Yashupadhyaya commented 1 year ago

I tried using starcoder.cpp model and got the same error, below are the docker logs.

llama.cpp: loading model from /models/starcoder-ggml-q4_1.bin
error loading model: llama.cpp: tensor '' should not be 999572548-dimensional
llama_init_from_file: failed to load model
llama.cpp: loading model from /models/starcoder-ggml-q4_1.bin
error loading model: missing tok_embeddings.weight
gptjllama_init_from_file: failed to load model
LLAMA ERROR: failed to load model from /models/starcoder-ggml-q4_1.bin
mpt_model_load: invalid model file '/models/starcoder-ggml-q4_1.bin' (bad magic)
GPT-J ERROR: failed to load model from /models/starcoder-ggml-q4_1.bingptj_model_load: invalid model file '/models/starcoder-ggml-q4_1.bin' (bad vocab size 13 != 49152)
mpt_model_load: loading model from '/models/starcoder-ggml-q4_1.bin' - please wait ...
gptj_model_load: loading model from '/models/starcoder-ggml-q4_1.bin' - please wait ...
gptj_model_load: n_vocab = 49152
gptj_model_load: n_ctx  = 8192
gptj_model_load: n_embd = 6144
gptj_model_load: n_head = 48
gptj_model_load: n_layer = 40
gptj_model_load: n_rot  = 1003
gptj_model_load: f16   = 49152
GPT-J ERROR: failed to load model from /models/starcoder-ggml-q4_1.binStarting LocalAI using 4 threads, with models path: /models

 ┌───────────────────────────────────────────────────┐
 │          Fiber v2.46.0          │
 │        http://127.0.0.1:8080/        │
 │    (bound on host 0.0.0.0 and port 8080)    │
 │                          │
 │ Handlers ............ 23 Processes ........... 1 │
 │ Prefork ....... Disabled PID ................. 1 │
 └───────────────────────────────────────────────────┘

llama.cpp: loading model from /models/starcoder-ggml-q4_1.bin
error loading model: llama.cpp: tensor '' should not be 999572548-dimensional
llama_init_from_file: failed to load model
llama.cpp: loading model from /models/starcoder-ggml-q4_1.bin
error loading model: missing tok_embeddings.weight
gptjllama_init_from_file: failed to load model
LLAMA ERROR: failed to load model from /models/starcoder-ggml-q4_1.bin
mpt_model_load: loading model from '/models/starcoder-ggml-q4_1.bin' - please wait ...
mpt_model_load: invalid model file '/models/starcoder-ggml-q4_1.bin' (bad magic)
GPT-J ERROR: failed to load model from /models/starcoder-ggml-q4_1.bingptj_model_load: invalid model file '/models/starcoder-ggml-q4_1.bin' (bad vocab size 13 != 49152)
gptj_model_load: loading model from '/models/starcoder-ggml-q4_1.bin' - please wait ...
gptj_model_load: n_vocab = 49152
gptj_model_load: n_ctx  = 8192
gptj_model_load: n_embd = 6144
gptj_model_load: n_head = 48
gptj_model_load: n_layer = 40
gptj_model_load: n_rot  = 1003
gptj_model_load: f16   = 49152
GPT-J ERROR: failed to load model from /models/starcoder-ggml-q4_1.bin
rai62 commented 1 year ago

I think the cause is https://github.com/go-skynet/go-ggml-transformers.cpp/blob/master/mpt.cpp#L196. I wonder if checking for the magic is really necessary.

Free-Radical commented 1 year ago

Same issue, till I built it locally using directions per instructions, working now 😊

boixu commented 1 year ago

Also had this curl: (52) Empty reply from server error. Turns out it was caused by OOM. Increasing the servers memory fixed the issue. But this should be reflected somehow. Some error message or something not just crashing and figuring this out blindly.

jimbydude commented 1 year ago

In case this helps anyone else, I received this response to the get models curl in the quick start, but about 10 minutes later my model appeared when I asked it again, so perhaps they take some time to load into RAM or something.

Yes, this for me. Waited for about 5 mins and it started working, so as mentioned just takes a little time start up and load models etc.

DuncanLHS commented 1 year ago

As others have said, it can take a few minutes to build everything. I'm using docker desktop in windows and if you keep an eye on the logs it should be available once you see something like this: image

This is from from following the gpt4all example.

localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

The issue seems to be related to the OpenAI API not being able to find the requested model "gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin". This could be due to a misconfiguration in the localai project or the OpenAI API itself. To troubleshoot this issue, you can try the following steps:

  1. Check if the model "gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin" is present in the "models" directory specified in the localai project's Dockerfile. If it is not, try rebuilding the model using the OpenAI API or downloading it from a different source.
  2. Check if the OpenAI API is properly configured to work with the localai project. Ensure that the API is running and that the required environment variables are set correctly in the Docker container.
  3. If the issue persists, try restarting the Docker container and rebuilding the localai project from scratch to ensure that all dependencies and configurations are correctly installed.

Sources:

JWinFlexport commented 11 months ago

In case this helps anyone else, I received this response to the get models curl in the quick start, but about 10 minutes later my model appeared when I asked it again, so perhaps they take some time to load into RAM or something.

Same to me, but mine takes longer to respond, around 20 mins. I'm using M1 MacBook.