Open ralyodio opened 1 year ago
i believe this model isn't supported, but ideally the server wouldnj't crash.
Please show us more detail. for example, the container's log
localai-api-1 | 1:02PM DBG [llama] Attempting to load
localai-api-1 | 1:02PM DBG Loading model llama from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:02PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | llama.cpp: loading model from /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | error loading model: llama.cpp: tensor '���x�7�h' should not be 1004879872-dimensional
localai-api-1 | llama_init_from_file: failed to load model
localai-api-1 | 1:03PM DBG [llama] Fails: failed loading model
localai-api-1 | 1:03PM DBG [gpt4all-llama] Attempting to load
localai-api-1 | 1:03PM DBG Loading model gpt4all-llama from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | llama.cpp: loading model from /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file?
localai-api-1 | gptjllama_init_from_file: failed to load model
localai-api-1 | LLAMA ERROR: failed to load model from /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | gpt2_model_load: loading model from '/models/gml-gpt4all-j.bin'
localai-api-1 | stablelm_model_load: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1 | dollyv2_model_load: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1 | redpajama_model_load: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1 | replit_model_load: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1 | gpt_neox_model_load: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1 | bert_load_from_file: loading model from '/models/gml-gpt4all-j.bin' - please wait ...
localai-api-1 | starcoder_model_load: loading model from '/models/gml-gpt4all-j.bin'
localai-api-1 | 1:03PM DBG [gpt4all-llama] Fails: failed loading model
localai-api-1 | 1:03PM DBG [gpt4all-mpt] Attempting to load
localai-api-1 | 1:03PM DBG Loading model gpt4all-mpt from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | mpt_model_load: loading model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' - please wait ...
localai-api-1 | mpt_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1 | GPT-J ERROR: failed to load model from /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin1:03PM DBG [gpt4all-mpt] Fails: failed loading model
localai-api-1 | 1:03PM DBG [gpt4all-j] Attempting to load
localai-api-1 | 1:03PM DBG Loading model gpt4all-j from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | gptj_model_load: loading model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' - please wait ...
localai-api-1 | gptj_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1 | GPT-J ERROR: failed to load model from /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin1:03PM DBG [gpt4all-j] Fails: failed loading model
localai-api-1 | 1:03PM DBG [gpt2] Attempting to load
localai-api-1 | 1:03PM DBG Loading model gpt2 from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | gpt2_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1 | gpt2_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1 | 1:03PM DBG [gpt2] Fails: failed loading model
localai-api-1 | 1:03PM DBG [stablelm] Attempting to load
localai-api-1 | 1:03PM DBG Loading model stablelm from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | stablelm_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1 | stablelm_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1 | 1:03PM DBG [stablelm] Fails: failed loading model
localai-api-1 | 1:03PM DBG [dolly] Attempting to load
localai-api-1 | 1:03PM DBG Loading model dolly from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | dollyv2_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1 | dolly_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1 | 1:03PM DBG [dolly] Fails: failed loading model
localai-api-1 | 1:03PM DBG [redpajama] Attempting to load
localai-api-1 | 1:03PM DBG Loading model redpajama from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | redpajama_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1 | redpajama_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1 | 1:03PM DBG [redpajama] Fails: failed loading model
localai-api-1 | 1:03PM DBG [replit] Attempting to load
localai-api-1 | 1:03PM DBG Loading model replit from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | replit_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1 | replit_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1 | 1:03PM DBG [replit] Fails: failed loading model
localai-api-1 | 1:03PM DBG [gptneox] Attempting to load
localai-api-1 | 1:03PM DBG Loading model gptneox from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | gpt_neox_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1 | gpt_neox_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1 | 1:03PM DBG [gptneox] Fails: failed loading model
localai-api-1 | 1:03PM DBG [bert-embeddings] Attempting to load
localai-api-1 | 1:03PM DBG Loading model bert-embeddings from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | bert_load_from_file: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1 | bert_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1 | 1:03PM DBG [bert-embeddings] Fails: failed loading model
localai-api-1 | 1:03PM DBG [starcoder] Attempting to load
localai-api-1 | 1:03PM DBG Loading model starcoder from gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | 1:03PM DBG Loading model in memory from file: /models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin
localai-api-1 | starcoder_model_load: invalid model file '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin' (bad magic)
localai-api-1 | starcoder_bootstrap: failed to load model from '/models/gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin'
localai-api-1 | 1:03PM DBG [starcoder] Fails: failed loading model
localai-api-1 | [172.18.0.1]:42438 500 - POST /v1/completions
all subsequent requests hang even when changing to a known good model.
Thanks for your feedback, maybe you can check it later, it is a loop to load the models. And if you want to have a quick start, I suggest you use models here: https://github.com/go-skynet/model-gallery and we will investigate this issue.
I have same issue
the same issue too, My models are these: ggml-gpt4all-j.bin,vicuna-13b-v1.1 , the api container run up sucecess ,but api seems not work : curl http://localhost:8080/v1/models curl: (52) Empty reply from server
i got this same result following exactly the steps in the tutorial in the readme
I got the same issue when using ggml-gpt4all-j
model, and I solve it later by increase the memory limit of docker. I think it might be a OOM error when loading models with low container memory limit.
I just tried the example (wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j
) and I could not even call /v1/models, it would just say Empty reply from server
.
I'm on an ARM device, the M2 precisely.
After manually building it, everything seems to work just fine...
In case this helps anyone else, I received this response to the get models curl in the quick start, but about 10 minutes later my model appeared when I asked it again, so perhaps they take some time to load into RAM or something.
I tried using starcoder.cpp model and got the same error, below are the docker logs.
llama.cpp: loading model from /models/starcoder-ggml-q4_1.bin
error loading model: llama.cpp: tensor '' should not be 999572548-dimensional
llama_init_from_file: failed to load model
llama.cpp: loading model from /models/starcoder-ggml-q4_1.bin
error loading model: missing tok_embeddings.weight
gptjllama_init_from_file: failed to load model
LLAMA ERROR: failed to load model from /models/starcoder-ggml-q4_1.bin
mpt_model_load: invalid model file '/models/starcoder-ggml-q4_1.bin' (bad magic)
GPT-J ERROR: failed to load model from /models/starcoder-ggml-q4_1.bingptj_model_load: invalid model file '/models/starcoder-ggml-q4_1.bin' (bad vocab size 13 != 49152)
mpt_model_load: loading model from '/models/starcoder-ggml-q4_1.bin' - please wait ...
gptj_model_load: loading model from '/models/starcoder-ggml-q4_1.bin' - please wait ...
gptj_model_load: n_vocab = 49152
gptj_model_load: n_ctx = 8192
gptj_model_load: n_embd = 6144
gptj_model_load: n_head = 48
gptj_model_load: n_layer = 40
gptj_model_load: n_rot = 1003
gptj_model_load: f16 = 49152
GPT-J ERROR: failed to load model from /models/starcoder-ggml-q4_1.binStarting LocalAI using 4 threads, with models path: /models
┌───────────────────────────────────────────────────┐
│ Fiber v2.46.0 │
│ http://127.0.0.1:8080/ │
│ (bound on host 0.0.0.0 and port 8080) │
│ │
│ Handlers ............ 23 Processes ........... 1 │
│ Prefork ....... Disabled PID ................. 1 │
└───────────────────────────────────────────────────┘
llama.cpp: loading model from /models/starcoder-ggml-q4_1.bin
error loading model: llama.cpp: tensor '' should not be 999572548-dimensional
llama_init_from_file: failed to load model
llama.cpp: loading model from /models/starcoder-ggml-q4_1.bin
error loading model: missing tok_embeddings.weight
gptjllama_init_from_file: failed to load model
LLAMA ERROR: failed to load model from /models/starcoder-ggml-q4_1.bin
mpt_model_load: loading model from '/models/starcoder-ggml-q4_1.bin' - please wait ...
mpt_model_load: invalid model file '/models/starcoder-ggml-q4_1.bin' (bad magic)
GPT-J ERROR: failed to load model from /models/starcoder-ggml-q4_1.bingptj_model_load: invalid model file '/models/starcoder-ggml-q4_1.bin' (bad vocab size 13 != 49152)
gptj_model_load: loading model from '/models/starcoder-ggml-q4_1.bin' - please wait ...
gptj_model_load: n_vocab = 49152
gptj_model_load: n_ctx = 8192
gptj_model_load: n_embd = 6144
gptj_model_load: n_head = 48
gptj_model_load: n_layer = 40
gptj_model_load: n_rot = 1003
gptj_model_load: f16 = 49152
GPT-J ERROR: failed to load model from /models/starcoder-ggml-q4_1.bin
I think the cause is https://github.com/go-skynet/go-ggml-transformers.cpp/blob/master/mpt.cpp#L196. I wonder if checking for the magic is really necessary.
Same issue, till I built it locally using directions per instructions, working now 😊
Also had this curl: (52) Empty reply from server error. Turns out it was caused by OOM. Increasing the servers memory fixed the issue. But this should be reflected somehow. Some error message or something not just crashing and figuring this out blindly.
In case this helps anyone else, I received this response to the get models curl in the quick start, but about 10 minutes later my model appeared when I asked it again, so perhaps they take some time to load into RAM or something.
Yes, this for me. Waited for about 5 mins and it started working, so as mentioned just takes a little time start up and load models etc.
As others have said, it can take a few minutes to build everything. I'm using docker desktop in windows and if you keep an eye on the logs it should be available once you see something like this:
This is from from following the gpt4all example.
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me, I don't support (yet) replying!
The issue seems to be related to the OpenAI API not being able to find the requested model "gpt4-x-alpaca-13b-ggml-q4_0-cuda.bin". This could be due to a misconfiguration in the localai project or the OpenAI API itself. To troubleshoot this issue, you can try the following steps:
Sources:
In case this helps anyone else, I received this response to the get models curl in the quick start, but about 10 minutes later my model appeared when I asked it again, so perhaps they take some time to load into RAM or something.
Same to me, but mine takes longer to respond, around 20 mins. I'm using M1 MacBook.
running this command crashes the docker container.