mudler / LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
https://localai.io
MIT License
21.91k stars 1.68k forks source link

Cannot get this to run on Proxmox #773

Open arsaboo opened 12 months ago

arsaboo commented 12 months ago

LocalAI version: 1.21.0

Environment, CPU architecture, OS, and Version: Proxmox VM (with CPU set to host) Linux localai 5.19.0-46-generic #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 21 15:35:31 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug I am unable to get started. Whether I use rebuild=True or not, I get the following in the logs (I followed the instructions in the Readme to get started):

1:54PM DBG no galleries to load
1:54PM INF Starting LocalAI using 4 threads, with models path: /models
1:54PM INF LocalAI version: v1.21.0 (fb6cce487fb53d9de1c1a6b3414261f52b5cdbe0)

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.48.0                   │
 │               http://127.0.0.1:8080               │
 │       (bound on host 0.0.0.0 and port 8080)       │
 │                                                   │
 │ Handlers ............ 31  Processes ........... 1 │
 │ Prefork ....... Disabled  PID .............. 7164 │
 └───────────────────────────────────────────────────┘

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41937: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:43565: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35255: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40915: connect: connection refused"

When I run curl http://localhost:8080/v1/models, I get:

{"object":"list","data":[{"id":"ggml-gpt4all-j","object":"model"}]}

However, when I run

curl http://192.168.2.110:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9
   }'

after a long wait, I get:

{"error":{"code":500,"message":"could not load model - all backends returned error: 12 errors occurred:\n\t* rpc error: code = Unavailable desc = error reading from server: EOF\n\t* rpc error: code = Unavailable desc = error reading from server: EOF\n\t* rpc error: code = Unavailable desc = error reading from server: EOF\n\t* rpc error: code = Unavailable desc = error reading from server: EOF\n\t* rpc error: code = Unavailable desc = error reading from server: EOF\n\t* rpc error: code = Unavailable desc = error reading from server: EOF\n\t* rpc error: code = Unavailable desc = error reading from server: EOF\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unavailable desc = error reading from server: EOF\n\t* rpc error: code = Unavailable desc = error reading from server: EOF\n\t* rpc error: code = Unavailable desc = error reading from server: EOF\n\t* rpc error: code = Unknown desc = failed loading model\n\n","type":""}}

CPUinfo:

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 151
model name      : 12th Gen Intel(R) Core(TM) i9-12900T
stepping        : 2
microcode       : 0x1e
cpu MHz         : 1382.400
cache size      : 16384 KB
physical id     : 0
siblings        : 3
core id         : 0
cpu cores       : 3
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 31
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni arat umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml tsc_scaling
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb
bogomips        : 2764.80
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 151
model name      : 12th Gen Intel(R) Core(TM) i9-12900T
stepping        : 2
microcode       : 0x1e
cpu MHz         : 1382.400
cache size      : 16384 KB
physical id     : 0
siblings        : 3
core id         : 1
cpu cores       : 3
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 31
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni arat umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml tsc_scaling
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb
bogomips        : 2764.80
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 151
model name      : 12th Gen Intel(R) Core(TM) i9-12900T
stepping        : 2
microcode       : 0x1e
cpu MHz         : 1382.400
cache size      : 16384 KB
physical id     : 0
siblings        : 3
core id         : 2
cpu cores       : 3
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 31
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni arat umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml tsc_scaling
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb
bogomips        : 2764.80
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Let me know if I can provide any additional information.

localai-bot commented 12 months ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

I see that the issue is related to power management. Is there any specific question or problem you are facing regarding power management?

Sources:

simonbermudez commented 12 months ago

I have the same issue running on an HP Z800 running Ubuntu 22.04

emakkus commented 12 months ago

This: "model": "ggml-gpt4all-j" is propably a ggml-gpt4all-j.yaml right?

You can look into the /models path where the models should reside. There should be a yaml file with the same name. If you instead just have the .bin file there, localai will throw the model at all possible backends until something sticks. I had problems with that though, since the ram filled up and everything went to a crawl. My solution was to create the .yaml that points to the *.bin file and tells localai which backend to use.

Here is my guanaco.yaml example, that is in the /models directory together with the model itself:

backend: llama
context_size: 4096
parameters:
  model: guanaco-33B.ggmlv3.q4_0.bin
  temperature: 0.2
  top_k: 80
  top_p: 0.7

gpu_layers: 60
roles:
  assistant: 'Assistant:'
  system: 'System:'
  user: 'User:'
template:
  chat: guanaco-chat
  completion: guanaco-completion

Once you have the *.yaml created, you can apply it via the following call:

curl --location 'http://lxdocker:8080/models/apply' \
--header 'Content-Type: application/json' \
--data '{
    "url": "file:///models/guanaco.yaml"
}'

Should anything be unclear, read the docs here: https://localai.io/models/

arsaboo commented 12 months ago

Ok...so I created the yaml file:

name: "gpt4all-j"
description: |
    A commercially licensable model based on GPT-J and trained by Nomic AI on the v0 GPT4All dataset.
license: "Apache 2.0"
urls:
- https://gpt4all.io
config_file: |
    backend: gpt4all-j
    parameters:
      model: ggml-gpt4all-j.bin
      top_k: 80
      temperature: 0.2
      top_p: 0.7
    context_size: 1024
    template:
      completion: "gpt4all-completion"
      chat: gpt4all-chat

files:
    - filename: "ggml-gpt4all-j.bin"
      sha256: "acd54f6da1cad7c04c48b785178d686c720dcbe549903032a0945f97b1a43d20"
      uri: "https://gpt4all.io/models/ggml-gpt4all-j.bin"

prompt_templates:
- name: "gpt4all-completion"
  content: |
    Complete the prompt
    ### Prompt:
    {{.Input}}
    ### Response:
- name: "gpt4all-chat"
  content: |
    The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
    ### Prompt:
    {{.Input}}
    ### Response:

I applied the same using:

$ curl --location 'http://192.168.2.110:8080/models/apply' --header 'Content-Type: application/json' --data '{
    "url": "file:///models/ggml-gpt4all-j.yaml"
}'
{"uuid":"765102a9-2737-11ee-ae41-0242ac120002","status":"http://192.168.2.110:8080/models/jobs/765102a9-2737-11ee-ae41-0242ac120002"}

However, even after this I get the same error:

$ curl http://192.168.2.110:8080/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "ggml-gpt4all-j", "messages": [{"role": "user", "content": "How are you?"}], "temperature": 0.9}'
{"error":{"code":500,"message":"could not load model - all backends returned error: 12 errors occurred:\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\t* rpc error: code = Unknown desc = failed loading model\n\n","type":""}}
arsaboo commented 12 months ago

Even when I use the following yaml file, I get the same error:


backend: ggml-gpt4all-j
context_size: 1024
parameters:
  model: ggml-gpt4all-j.bin
  temperature: 0.2
  top_k: 80
  top_p: 0.7

gpu_layers: 60
roles:
  assistant: 'Assistant:'
  system: 'System:'
  user: 'User:'
template:
  chat: gpt4-chat
  completion: gpt4-completion
emakkus commented 12 months ago

According to this table: https://localai.io/model-compatibility/index.html

the backend name for you to use should be this: gpt4all-j

No ggml in front of it.

You will want to prevent it to try out all backends by directly telling it what backend to use.

Edit:

Okay so I see in your original yaml you actually have used the right name for the backend... weird that that didn't work...

But in the second yaml file you showed you would need to change the backend name.

Also if I call this curl:

curl --location 'http://lxdocker:8080/models'

I get the following output:

{
    "object": "list",
    "data": [
        {
            "id": "WizardCoder-15B-1.0.ggmlv3.q5_1.bin",
            "object": "model"
        },
        {
            "id": "guanaco-33B.ggmlv3.q4_0.bin",
            "object": "model"
        },
        {
            "id": "guanaco",
            "object": "model"
        },
        {
            "id": "wizardcoder",
            "object": "model"
        }
    ]
}

But then when I call chat completion I use the following name for the model to use:

curl --location 'http://lxdocker:8080/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
     "model": "guanaco",
     "messages": [
         {
             "role": "user",
             "content": "What'\''s your name?"
         }
     ],
     "temperature": 0.7,
     "stream": true
}'

As you can see, I point my chat completion model reference to the yaml file (without the file ending).

You however use a name that is similar to your actual model *.bin filename: "model": "ggml-gpt4all-j"

Now I'm not 100% sure what happens, but it might be that in your case, with your inference call, you are not referencing the configuration (where you tell it what backend to use) but instead you reference the model directly, so localai has to try out all the backends that you can see listed in the table.

My advice to you:

Run the curl that shows you all the models that localai is currently aware of. It should show you the yaml config file also as a "model" object, just like in my case.

Make sure the backend name is correctly set in your configuration

Use the filename (without the file ending) of your yaml for your inference call.

Also try to make sure your yaml filename and model filename are not too similar, else you might get confused what's what.

That should make it work! At least it shouldn't try to use all backends there are, as this error message indicates: "...all backends returned error..."

Oh and btw, the first yaml that you showed would internally be converted to something like the second yaml you showed. it puts the templating stuff in seperate files and only referecnes to them in the yaml, that's why my yaml looked so tiny compared to your first one.

brujoand commented 11 months ago

I don't quite understand how these mappings work, but I had the exact same problem. Until I tried installing with a custom name:

$ curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
      "url": "github:go-skynet/model-gallery/gpt4all-j.yaml",
      "name": "gpt-3.5-turbo"
   }'

Now suddenly it works:

$ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "gpt-3.5-turbo",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9
   }'
{"object":"chat.completion","model":"gpt-3.5-turbo","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I am well, thank you. How are things going with you?"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

So I assume there is something going on with the way these models are named. And even after the success I above I still get no available models:

$ curl http://localhost:8080/models/available
null
SmartPhoneLover commented 11 months ago

I have a similar problem, but in my case the output of the cmd, is:

DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:40747): stderr load_model: error 'Model format not supported (no matching implementation found)'


YAMLs...

gpt-3.5-turbo.yaml

backend: gpt4all-j
context_size: 1024
name: gpt-3.5-turbo
parameters:
  model: ggml-gpt4all-j.bin
  temperature: 0.2
  top_k: 80
  top_p: 0.7
template:
  chat: gpt4all-chat
  completion: gpt4all-completion

gpt4all-j-groovy.yaml

backend: gpt4all-j
context_size: 1024
name: gpt4all-j-groovy
parameters:
  model: ggml-gpt4all-j-v1.3-groovy.bin
  temperature: 0.2
  top_k: 80
  top_p: 0.7
template:
  chat: gpt4all-chat
  completion: gpt4all-completion

COMMAND...

curl http://192.168.10.2:20000/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "gpt-3.5-turbo", 
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.2
   }'
curl http://192.168.10.2:20000/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "gpt4all-j-groovy", 
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.2
   }'

I installed two different variants of gpt4all just for testing.

artshade commented 11 months ago

I have a similar problem, but in my case the output of the cmd, is:

DBG GRPC(ggml-gpt4all-j.bin-127.0.0.1:40747): stderr load_model: error 'Model format not supported (no matching implementation found)'

Greetings! Same issue: https://github.com/go-skynet/LocalAI/issues/771#issuecomment-1675508431