Fresh install, getting error 500: grpc service not ready, on both exllama and AutoGPTQ

Subarasheese commented 1 year ago

LocalAI version: Commit 2bacd0180d409b2b8f5c6f1b1ef13ccfda108c48

Environment, CPU architecture, OS, and Version:

CPU Architecture: x86_64, OS: Arch Linux, Version: 6.3.8-arch1-1

Describe the bug I followed the instructions here:

https://localai.io/model-compatibility/exllama/

And got this as the output of the curl:

{
    "error": {
        "code": 500,
        "message": "grpc service not ready",
        "type": ""
    }
}

To Reproduce

Fresh install, then following the instructions of either those:

https://localai.io/model-compatibility/exllama/

https://localai.io/model-compatibility/autogptq/

Expected behavior The LLM output.

Logs

@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
model name      : AMD Ryzen 7 2700 Eight-Core Processor
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
CPU:    AVX    found OK
CPU:    AVX2   found OK
CPU: no AVX512 found
@@@@@
2:11AM INF Starting LocalAI using 4 threads, with models path: /models
2:11AM INF LocalAI version: v1.24.1 (9cc8d9086580bd2a96f5c96a6b873242879c70bc)

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.48.0                   │ 
 │               http://127.0.0.1:8080               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ............ 55  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ................ 14 │ 
 └───────────────────────────────────────────────────┘ 

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42127: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37499: connect: connection refused"

Additional context

My models dir looks like this:


[privateserver@privateserver models]$ tree
.
├── exllama.yaml
└── TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ
    ├── config.json
    ├── generation_config.json
    ├── huggingface-metadata.txt
    ├── quantize_config.json
    ├── README.md
    ├── special_tokens_map.json
    ├── tokenizer_config.json
    ├── tokenizer.json
    ├── tokenizer.model
    ├── trainer_state.json
    └── Wizard-Vicuna-30B-Uncensored-GPTQ-4bit--1g.act.order.safetensors

2 directories, 12 files

exllama.yaml looks like this:

name: exllama
parameters:
  model: TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ
backend: exllama

localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

Sources:

loversama commented 1 year ago

Did this managed to get solved? really wanting to use LocalAI for GPU models (AWQ and GPTQ) but am having 0 luck..

virus2016 commented 9 months ago

Same

jesst3r commented 8 months ago

same here

chankwongyin commented 8 months ago

same

qingfenghcy commented 7 months ago

same

jayasimha-raghavan-unskript commented 7 months ago

Same issue here. Any resolution?

QYAdult commented 4 months ago

Has it been resolved

mudler commented 4 months ago

Please add logs with LocalAI running with the debug flag ( --debug )

maurerle commented 1 week ago

Thank you! I had this issue as well until now. I did run DEBUG=true local-ai after install and found that the GRPC call fails because of: ImportError libcudart.so.12 cannot open shared object file No such file which hints that I was missing apt install cuda-toolkit (even though I had apt install nvidia-cuda-toolkit which installed the ubuntu cuda11..? And nvidia-smi and everything was working already.

I now get the following error on Nvidia A100: tensor 'token_embd.weight' (q4_0) (and 0 others) cannot be used with preferred buffer type CUDA_Host, using CPU instead which seems to be not that bad: https://github.com/LostRuins/koboldcpp/issues/1223

But when running in docker, I still get the error:

user@server:/home/user$ docker run -e DEBUG=True -p 8080:8080 --name local-ai -ti -v /raid/localai/models:/build/models localai/localai:latest-aio-gpu-nvidia-cuda-12
DBG GRPC(Meta Llama 3.1 70B Instruct-127.0.0.1:38389): stderr llama-cpp-fallback: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

mudler / LocalAI

Fresh install, getting error 500: grpc service not ready, on both exllama and AutoGPTQ #923

:warning::warning::warning::warning::warning:

:warning::warning::warning::warning::warning: