Open arsaboo opened 12 months ago
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me, I don't support (yet) replying!
I see that the issue is related to power management. Is there any specific question or problem you are facing regarding power management?
Sources:
I have the same issue running on an HP Z800 running Ubuntu 22.04
This: "model": "ggml-gpt4all-j" is propably a ggml-gpt4all-j.yaml right?
You can look into the /models path where the models should reside. There should be a yaml file with the same name. If you instead just have the .bin file there, localai will throw the model at all possible backends until something sticks. I had problems with that though, since the ram filled up and everything went to a crawl. My solution was to create the .yaml that points to the *.bin file and tells localai which backend to use.
Here is my guanaco.yaml example, that is in the /models directory together with the model itself:
backend: llama
context_size: 4096
parameters:
model: guanaco-33B.ggmlv3.q4_0.bin
temperature: 0.2
top_k: 80
top_p: 0.7
gpu_layers: 60
roles:
assistant: 'Assistant:'
system: 'System:'
user: 'User:'
template:
chat: guanaco-chat
completion: guanaco-completion
Once you have the *.yaml created, you can apply it via the following call:
curl --location 'http://lxdocker:8080/models/apply' \
--header 'Content-Type: application/json' \
--data '{
"url": "file:///models/guanaco.yaml"
}'
Should anything be unclear, read the docs here: https://localai.io/models/
Ok...so I created the yaml file:
name: "gpt4all-j"
description: |
A commercially licensable model based on GPT-J and trained by Nomic AI on the v0 GPT4All dataset.
license: "Apache 2.0"
urls:
- https://gpt4all.io
config_file: |
backend: gpt4all-j
parameters:
model: ggml-gpt4all-j.bin
top_k: 80
temperature: 0.2
top_p: 0.7
context_size: 1024
template:
completion: "gpt4all-completion"
chat: gpt4all-chat
files:
- filename: "ggml-gpt4all-j.bin"
sha256: "acd54f6da1cad7c04c48b785178d686c720dcbe549903032a0945f97b1a43d20"
uri: "https://gpt4all.io/models/ggml-gpt4all-j.bin"
prompt_templates:
- name: "gpt4all-completion"
content: |
Complete the prompt
### Prompt:
{{.Input}}
### Response:
- name: "gpt4all-chat"
content: |
The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
### Prompt:
{{.Input}}
### Response:
I applied the same using:
$ curl --location 'http://192.168.2.110:8080/models/apply' --header 'Content-Type: application/json' --data '{
"url": "file:///models/ggml-gpt4all-j.yaml"
}'
{"uuid":"765102a9-2737-11ee-ae41-0242ac120002","status":"http://192.168.2.110:8080/models/jobs/765102a9-2737-11ee-ae41-0242ac120002"}
However, even after this I get the same error:
$ curl http://192.168.2.110:8080/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "ggml-gpt4all-j", "messages": [{"role": "user", "content": "How are you?"}], "temperature": 0.9}'
{"error":{"code":500,"message":"could not load model - all backends returned error: 12 errors occurred:\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\n","type":""}}
Even when I use the following yaml file, I get the same error:
backend: ggml-gpt4all-j
context_size: 1024
parameters:
model: ggml-gpt4all-j.bin
temperature: 0.2
top_k: 80
top_p: 0.7
gpu_layers: 60
roles:
assistant: 'Assistant:'
system: 'System:'
user: 'User:'
template:
chat: gpt4-chat
completion: gpt4-completion
According to this table: https://localai.io/model-compatibility/index.html
the backend name for you to use should be this: gpt4all-j
No ggml in front of it.
You will want to prevent it to try out all backends by directly telling it what backend to use.
Edit:
Okay so I see in your original yaml you actually have used the right name for the backend... weird that that didn't work...
But in the second yaml file you showed you would need to change the backend name.
Also if I call this curl:
curl --location 'http://lxdocker:8080/models'
I get the following output:
{
"object": "list",
"data": [
{
"id": "WizardCoder-15B-1.0.ggmlv3.q5_1.bin",
"object": "model"
},
{
"id": "guanaco-33B.ggmlv3.q4_0.bin",
"object": "model"
},
{
"id": "guanaco",
"object": "model"
},
{
"id": "wizardcoder",
"object": "model"
}
]
}
But then when I call chat completion I use the following name for the model to use:
curl --location 'http://lxdocker:8080/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "guanaco",
"messages": [
{
"role": "user",
"content": "What'\''s your name?"
}
],
"temperature": 0.7,
"stream": true
}'
As you can see, I point my chat completion model reference to the yaml file (without the file ending).
You however use a name that is similar to your actual model *.bin filename: "model": "ggml-gpt4all-j"
Now I'm not 100% sure what happens, but it might be that in your case, with your inference call, you are not referencing the configuration (where you tell it what backend to use) but instead you reference the model directly, so localai has to try out all the backends that you can see listed in the table.
My advice to you:
Run the curl that shows you all the models that localai is currently aware of. It should show you the yaml config file also as a "model" object, just like in my case.
Make sure the backend name is correctly set in your configuration
Use the filename (without the file ending) of your yaml for your inference call.
Also try to make sure your yaml filename and model filename are not too similar, else you might get confused what's what.
That should make it work! At least it shouldn't try to use all backends there are, as this error message indicates: "...all backends returned error..."
Oh and btw, the first yaml that you showed would internally be converted to something like the second yaml you showed. it puts the templating stuff in seperate files and only referecnes to them in the yaml, that's why my yaml looked so tiny compared to your first one.
I don't quite understand how these mappings work, but I had the exact same problem. Until I tried installing with a custom name:
$ curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
"url": "github:go-skynet/model-gallery/gpt4all-j.yaml",
"name": "gpt-3.5-turbo"
}'
Now suddenly it works:
$ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
{"object":"chat.completion","model":"gpt-3.5-turbo","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I am well, thank you. How are things going with you?"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
So I assume there is something going on with the way these models are named. And even after the success I above I still get no available models:
$ curl http://localhost:8080/models/available
null
I have a similar problem, but in my case the output of the cmd, is:
DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:40747): stderr load_model: error 'Model format not supported (no matching implementation found)'
YAMLs...
gpt-3.5-turbo.yaml
backend: gpt4all-j
context_size: 1024
name: gpt-3.5-turbo
parameters:
model: ggml-gpt4all-j.bin
temperature: 0.2
top_k: 80
top_p: 0.7
template:
chat: gpt4all-chat
completion: gpt4all-completion
gpt4all-j-groovy.yaml
backend: gpt4all-j
context_size: 1024
name: gpt4all-j-groovy
parameters:
model: ggml-gpt4all-j-v1.3-groovy.bin
temperature: 0.2
top_k: 80
top_p: 0.7
template:
chat: gpt4all-chat
completion: gpt4all-completion
COMMAND...
curl http://192.168.10.2:20000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.2
}'
curl http://192.168.10.2:20000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt4all-j-groovy",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.2
}'
I installed two different variants of gpt4all just for testing.
I have a similar problem, but in my case the output of the cmd, is:
DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:40747): stderr load_model: error 'Model format not supported (no matching implementation found)'
Greetings! Same issue: https://github.com/go-skynet/LocalAI/issues/771#issuecomment-1675508431
LocalAI version: 1.21.0
Environment, CPU architecture, OS, and Version: Proxmox VM (with CPU set to host) Linux localai 5.19.0-46-generic #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 21 15:35:31 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Describe the bug I am unable to get started. Whether I use rebuild=True or not, I get the following in the logs (I followed the instructions in the Readme to get started):
When I run
curl http://localhost:8080/v1/models
, I get:However, when I run
after a long wait, I get:
CPUinfo:
Let me know if I can provide any additional information.