mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.19k stars 1.76k forks source link

All-in-One (AIO) Image - Models are not loading due to "grpc service not ready" #2103

Open Sarmingsteiner opened 4 months ago

Sarmingsteiner commented 4 months ago

LocalAI version: 2.12.4 latest

Hi, first of all thank you for your strong efforts! I am trying to install the AIO-CPU image on my Synology NAS via Synology Container (= Docker). However, according to the logs, only Stablediffusion seems to be loading (but does not work via Nextlcoud neither). When trying to create a chat via AnythingLLM (connection with LocalAI is succesful), it seems that it does not communicate with the LLM (no working load at all).

I have attached a screenshot of all the files included in the models folder. It seems to me that GPT-4 LLM (resp. SLM) files are missing?

LocalAI Model files

So, could you please point me out what could be wrong?

And these are are the error logs from Synology Container:

<html>
<br>
<body>
<!--StartFragment-->
2024/04/22 18:13:58 | stdout | 4:13PM INF [stablediffusion] Loads OK
-- | -- | --
2024/04/22 18:13:54 | stdout | 4:13PM INF Loading model '30f19017f38ab930fb78ec796b84f457' with backend stablediffusion
2024/04/22 18:13:54 | stdout | 4:13PM INF [stablediffusion] Attempting to load
2024/04/22 18:13:54 | stdout | 4:13PM INF [whisper] Fails: grpc service not ready
2024/04/22 18:13:52 | stdout | 4:13PM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:41284: connect: connection refused\""
2024/04/22 18:13:14 | stdout | 4:13PM INF Loading model '30f19017f38ab930fb78ec796b84f457' with backend whisper
2024/04/22 18:13:14 | stdout | 4:13PM INF [whisper] Attempting to load
2024/04/22 18:13:14 | stdout | 4:13PM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
2024/04/22 18:13:12 | stdout | 4:13PM INF Loading model '30f19017f38ab930fb78ec796b84f457' with backend rwkv
2024/04/22 18:13:12 | stdout | 4:13PM INF [rwkv] Attempting to load
2024/04/22 18:13:12 | stdout | 4:13PM INF [bert-embeddings] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
2024/04/22 18:13:10 | stdout | 4:13PM INF Loading model '30f19017f38ab930fb78ec796b84f457' with backend bert-embeddings
2024/04/22 18:13:10 | stdout | 4:13PM INF [bert-embeddings] Attempting to load
2024/04/22 18:13:10 | stdout | 4:13PM INF [gpt4all] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
2024/04/22 18:13:08 | stdout | 4:13PM INF Loading model '30f19017f38ab930fb78ec796b84f457' with backend gpt4all
2024/04/22 18:13:08 | stdout | 4:13PM INF [gpt4all] Attempting to load
2024/04/22 18:13:08 | stdout | 4:13PM INF [llama-ggml] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
2024/04/22 18:13:06 | stdout | 4:13PM INF Loading model '30f19017f38ab930fb78ec796b84f457' with backend llama-ggml
2024/04/22 18:13:06 | stdout | 4:13PM INF [llama-ggml] Attempting to load
2024/04/22 18:13:06 | stdout | 4:13PM INF [llama-cpp] Fails: grpc service not ready
2024/04/22 18:13:04 | stdout | 4:13PM ERR failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:33435: connect: connection refused\""
2024/04/22 18:12:26 | stdout | 4:12PM INF Loading model '30f19017f38ab930fb78ec796b84f457' with backend llama-cpp
2024/04/22 18:12:26 | stdout | 4:12PM INF [llama-cpp] Attempting to load
2024/04/22 18:12:26 | stdout | 4:12PM INF Trying to load the model '30f19017f38ab930fb78ec796b84f457' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/coqui/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/vall-e-x/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/transformers/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/exllama/run.sh, /build/backend/python/bark/run.sh
2024/04/22 18:09:31 | stdout |  
2024/04/22 18:09:31 | stdout |  └───────────────────────────────────────────────────┘
2024/04/22 18:09:31 | stdout |  │ Prefork ....... Disabled  PID ................. 1 │
2024/04/22 18:09:31 | stdout |  │ Handlers ........... 181  Processes ........... 1 │
2024/04/22 18:09:31 | stdout |  │                                                   │
2024/04/22 18:09:31 | stdout |  │       (bound on host 0.0.0.0 and port 8080)       │
2024/04/22 18:09:31 | stdout |  │               http://127.0.0.1:8080               │
2024/04/22 18:09:31 | stdout |  │                   Fiber v2.52.0                   │
2024/04/22 18:09:31 | stdout |  ┌───────────────────────────────────────────────────┐
2024/04/22 18:09:31 | stdout |  
2024/04/22 18:09:30 | stdout | 4:09PM INF core/startup process completed!

<!--EndFragment-->
</body>
</html>
shuther commented 4 months ago

Unfortunately I confirm the problem with the Cuda12 docker. See below extra logs and one clear error (last log) curl http://linuxmain.local:8445/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }' {"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}}

The file llava-v1.6-mistral-7b.Q5_K_M.gguf is clearly present

localai-docker-api-1  | 8:22AM INF Trying to load the model '5c7cd056ecf9a4bb5b527410b97f48cb' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/transformers/run.sh, /build/backend/python/exllama/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/vall-e-x/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/bark/run.sh
localai-docker-api-1  | 8:22AM INF [llama-cpp] Attempting to load
localai-docker-api-1  | 8:22AM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-cpp
localai-docker-api-1  | 8:22AM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 8:22AM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: llama-cpp): {backendString:llama-cpp model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000400000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-docker-api-1  | 8:22AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
localai-docker-api-1  | 8:22AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:37173'
localai-docker-api-1  | 8:22AM DBG GRPC Service state dir: /tmp/go-processmanager56333377
localai-docker-api-1  | 8:22AM DBG GRPC Service Started
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stdout Server listening on 127.0.0.1:37173
localai-docker-api-1  | 8:22AM DBG GRPC Service Ready
localai-docker-api-1  | 8:22AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:5c7cd056ecf9a4bb5b527410b97f48cb ContextSize:4096 Seed:9748343 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/5c7cd056ecf9a4bb5b527410b97f48cb Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /build/models/5c7cd056ecf9a4bb5b527410b97f48cb (version GGUF V3 (latest))
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv   0:                       general.architecture str              = llama
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv   1:                               general.name str              = jeffq
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv   4:                          llama.block_count u32              = 32
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  11:                          general.file_type u32              = 18
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32032]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32032]   = [0.000000, 0.000000, 0.000000, 0.0000...
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32032]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 32000
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - kv  21:               general.quantization_version u32              = 2
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - type  f32:   65 tensors
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_loader: - type q6_K:  226 tensors
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_vocab: special tokens definition check successful ( 291/32032 ).
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: format           = GGUF V3 (latest)
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: arch             = llama
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: vocab type       = SPM
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_vocab          = 32032
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_merges         = 0
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_ctx_train      = 32768
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_embd           = 4096
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_head           = 32
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_head_kv        = 8
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_layer          = 32
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_rot            = 128
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_embd_head_k    = 128
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_embd_head_v    = 128
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_gqa            = 4
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_embd_k_gqa     = 1024
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_embd_v_gqa     = 1024
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: f_norm_eps       = 0.0e+00
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: f_clamp_kqv      = 0.0e+00
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: f_max_alibi_bias = 0.0e+00
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: f_logit_scale    = 0.0e+00
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_ff             = 14336
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_expert         = 0
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_expert_used    = 0
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: causal attn      = 1
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: pooling type     = 0
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: rope type        = 0
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: rope scaling     = linear
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: freq_base_train  = 10000.0
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: freq_scale_train = 1
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: n_yarn_orig_ctx  = 32768
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: rope_finetuned   = unknown
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: ssm_d_conv       = 0
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: ssm_d_inner      = 0
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: ssm_d_state      = 0
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: ssm_dt_rank      = 0
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: model type       = 7B
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: model ftype      = Q6_K
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: model params     = 7.24 B
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: model size       = 5.53 GiB (6.56 BPW)
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: general.name     = jeffq
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: BOS token        = 1 '<s>'
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: EOS token        = 32000 '<|im_end|>'
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: UNK token        = 0 '<unk>'
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_print_meta: LF token         = 13 '<0x0A>'
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr ggml_cuda_init: found 1 CUDA devices:
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr   Device 0: NVIDIA GeForce RTX 2060, compute capability 7.5, VMM: yes
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llm_load_tensors: ggml ctx size =    0.22 MiB
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr ggml_backend_cuda_buffer_type_alloc_buffer: allocating 5563.66 MiB on device 0: cudaMalloc failed: out of memory
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_model_load: error loading model: unable to allocate backend buffer
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_load_model_from_file: failed to load model
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stderr llama_init_from_gpt_params: error: failed to load model '/build/models/5c7cd056ecf9a4bb5b527410b97f48cb'
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:37173): stdout {"timestamp":1714033364,"level":"ERROR","function":"load_model","line":464,"message":"unable to load model","model":"/build/models/5c7cd056ecf9a4bb5b527410b97f48cb"}
localai-docker-api-1  | 8:22AM INF [llama-cpp] Fails: could not load model: rpc error: code = Canceled desc =
localai-docker-api-1  | 8:22AM INF [llama-ggml] Attempting to load
localai-docker-api-1  | 8:22AM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend llama-ggml
localai-docker-api-1  | 8:22AM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 8:22AM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: llama-ggml): {backendString:llama-ggml model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000400000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-docker-api-1  | 8:22AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-ggml
localai-docker-api-1  | 8:22AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:44527'
localai-docker-api-1  | 8:22AM DBG GRPC Service state dir: /tmp/go-processmanager1198198229
localai-docker-api-1  | 8:22AM DBG GRPC Service Started
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:44527): stderr 2024/04/25 08:22:44 gRPC Server listening at 127.0.0.1:44527
localai-docker-api-1  | 8:22AM DBG GRPC Service Ready
localai-docker-api-1  | 8:22AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:5c7cd056ecf9a4bb5b527410b97f48cb ContextSize:4096 Seed:9748343 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/5c7cd056ecf9a4bb5b527410b97f48cb Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:44527): stderr create_gpt_params: loading model /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:44527): stderr ggml_init_cublas: found 1 CUDA devices:
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:44527): stderr   Device 0: NVIDIA GeForce RTX 2060, compute capability 7.5
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:44527): stderr llama.cpp: loading model from /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:44527): stderr error loading model: unknown (magic, version) combination: 46554747, 00000003; is this really a GGML file?
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:44527): stderr llama_load_model_from_file: failed to load model
localai-docker-api-1  | 8:22AM INF [llama-ggml] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
localai-docker-api-1  | 8:22AM INF [gpt4all] Attempting to load
localai-docker-api-1  | 8:22AM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend gpt4all
localai-docker-api-1  | 8:22AM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 8:22AM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: gpt4all): {backendString:gpt4all model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000400000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-docker-api-1  | 8:22AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/gpt4all
localai-docker-api-1  | 8:22AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:42833'
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:44527): stderr llama_init_from_gpt_params: error: failed to load model '/build/models/5c7cd056ecf9a4bb5b527410b97f48cb'
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:44527): stderr load_binding_model: error: unable to load model
localai-docker-api-1  | 8:22AM DBG GRPC Service state dir: /tmp/go-processmanager189527023
localai-docker-api-1  | 8:22AM DBG GRPC Service Started
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:42833): stderr 2024/04/25 08:22:46 gRPC Server listening at 127.0.0.1:42833
localai-docker-api-1  | 8:22AM DBG GRPC Service Ready
localai-docker-api-1  | 8:22AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:5c7cd056ecf9a4bb5b527410b97f48cb ContextSize:4096 Seed:9748343 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/5c7cd056ecf9a4bb5b527410b97f48cb Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:42833): stderr load_model: error 'Model format not supported (no matching implementation found)'
localai-docker-api-1  | 8:22AM INF [gpt4all] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
localai-docker-api-1  | 8:22AM INF [bert-embeddings] Attempting to load
localai-docker-api-1  | 8:22AM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend bert-embeddings
localai-docker-api-1  | 8:22AM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 8:22AM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: bert-embeddings): {backendString:bert-embeddings model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000400000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-docker-api-1  | 8:22AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings
localai-docker-api-1  | 8:22AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:43999'
localai-docker-api-1  | 8:22AM DBG GRPC Service state dir: /tmp/go-processmanager2749133716
localai-docker-api-1  | 8:22AM DBG GRPC Service Started
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43999): stderr 2024/04/25 08:22:48 gRPC Server listening at 127.0.0.1:43999
localai-docker-api-1  | [127.0.0.1]:38860 200 - GET /readyz
localai-docker-api-1  | 8:22AM DBG GRPC Service Ready
localai-docker-api-1  | 8:22AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:5c7cd056ecf9a4bb5b527410b97f48cb ContextSize:4096 Seed:9748343 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/5c7cd056ecf9a4bb5b527410b97f48cb Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43999): stderr bert_load_from_file: invalid model file '/build/models/5c7cd056ecf9a4bb5b527410b97f48cb' (bad magic)
localai-docker-api-1  | 8:22AM INF [bert-embeddings] Fails: could not load model: rpc error: code = Unknown desc = failed loading model
localai-docker-api-1  | 8:22AM INF [rwkv] Attempting to load
localai-docker-api-1  | 8:22AM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend rwkv
localai-docker-api-1  | 8:22AM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 8:22AM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: rwkv): {backendString:rwkv model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000400000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:43999): stderr bert_bootstrap: failed to load model from '/build/models/5c7cd056ecf9a4bb5b527410b97f48cb'
localai-docker-api-1  | 8:22AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/rwkv
localai-docker-api-1  | 8:22AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:38739'
localai-docker-api-1  | 8:22AM DBG GRPC Service state dir: /tmp/go-processmanager1462042835
localai-docker-api-1  | 8:22AM DBG GRPC Service Started
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:38739): stderr 2024/04/25 08:22:50 gRPC Server listening at 127.0.0.1:38739
localai-docker-api-1  | 8:22AM DBG GRPC Service Ready
localai-docker-api-1  | 8:22AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:5c7cd056ecf9a4bb5b527410b97f48cb ContextSize:4096 Seed:9748343 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/5c7cd056ecf9a4bb5b527410b97f48cb Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:38739): stderr
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:38739): stderr /build/sources/go-rwkv/rwkv.cpp/rwkv_file_format.inc:93: header.magic == 0x67676d66
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:38739): stderr Invalid file header
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:38739): stderr /build/sources/go-rwkv/rwkv.cpp/rwkv_model_loading.inc:158: rwkv_fread_file_header(file.file, model.header)
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:38739): stderr
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:38739): stderr /build/sources/go-rwkv/rwkv.cpp/rwkv.cpp:63: rwkv_load_model_from_file(file_path, *ctx->model)
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:38739): stderr 2024/04/25 08:22:52 InitFromFile /build/models/5c7cd056ecf9a4bb5b527410b97f48cb failed
localai-docker-api-1  | 8:22AM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
localai-docker-api-1  | 8:22AM INF [whisper] Attempting to load
localai-docker-api-1  | 8:22AM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend whisper
localai-docker-api-1  | 8:22AM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 8:22AM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: whisper): {backendString:whisper model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000400000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-docker-api-1  | 8:22AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
localai-docker-api-1  | 8:22AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:39147'
localai-docker-api-1  | 8:22AM DBG GRPC Service state dir: /tmp/go-processmanager2008519706
localai-docker-api-1  | 8:22AM DBG GRPC Service Started
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:39147): stderr 2024/04/25 08:22:52 gRPC Server listening at 127.0.0.1:39147
localai-docker-api-1  | 8:22AM DBG GRPC Service Ready
localai-docker-api-1  | 8:22AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:5c7cd056ecf9a4bb5b527410b97f48cb ContextSize:4096 Seed:9748343 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/5c7cd056ecf9a4bb5b527410b97f48cb Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:39147): stderr whisper_init_from_file_with_params_no_state: loading model from '/build/models/5c7cd056ecf9a4bb5b527410b97f48cb'
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:39147): stderr whisper_model_load: loading model
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:39147): stderr whisper_model_load: invalid model data (bad magic)
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:39147): stderr whisper_init_with_params_no_state: failed to load model
localai-docker-api-1  | 8:22AM INF [whisper] Fails: could not load model: rpc error: code = Unknown desc = unable to load model
localai-docker-api-1  | 8:22AM INF [stablediffusion] Attempting to load
localai-docker-api-1  | 8:22AM INF Loading model '5c7cd056ecf9a4bb5b527410b97f48cb' with backend stablediffusion
localai-docker-api-1  | 8:22AM DBG Loading model in memory from file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb
localai-docker-api-1  | 8:22AM DBG Loading Model 5c7cd056ecf9a4bb5b527410b97f48cb with gRPC (file: /build/models/5c7cd056ecf9a4bb5b527410b97f48cb) (backend: stablediffusion): {backendString:stablediffusion model:5c7cd056ecf9a4bb5b527410b97f48cb threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000400000 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-docker-api-1  | 8:22AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/stablediffusion
localai-docker-api-1  | 8:22AM DBG GRPC Service for 5c7cd056ecf9a4bb5b527410b97f48cb will be running at: '127.0.0.1:44657'
localai-docker-api-1  | 8:22AM DBG GRPC Service state dir: /tmp/go-processmanager2981194633
localai-docker-api-1  | 8:22AM DBG GRPC Service Started
localai-docker-api-1  | 8:22AM DBG GRPC(5c7cd056ecf9a4bb5b527410b97f48cb-127.0.0.1:44657): stderr 2024/04/25 08:22:56 gRPC Server listening at 127.0.0.1:44657
localai-docker-api-1  | 8:22AM DBG GRPC Service Ready
localai-docker-api-1  | 8:22AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:5c7cd056ecf9a4bb5b527410b97f48cb ContextSize:4096 Seed:9748343 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/5c7cd056ecf9a4bb5b527410b97f48cb Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
localai-docker-api-1  | 8:22AM INF [stablediffusion] Loads OK
localai-docker-api-1  | [172.30.0.1]:59638 500 - POST /v1/chat/completions
clip.use_gelu bool             = false
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout clip_model_load: - type  f32:  236 tensors
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout clip_model_load: - type  f16:  142 tensors
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout clip_model_load: CLIP using CUDA backend
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout clip_model_load: text_encoder:   0
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout clip_model_load: vision_encoder: 1
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout clip_model_load: llava_projector:  1
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout clip_model_load: model size:     595.50 MB
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout clip_model_load: metadata size:  0.14 MB
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout clip_model_load: params backend buffer size =  595.50 MB (378 tensors)
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout clip_model_load: compute allocated memory: 32.89 MB
localai-docker-api-1  | 8:25AM DBG GRPC(llava-v1.6-mistral-7b.Q5_K_M.gguf-127.0.0.1:35553): stdout {"timestamp":1714033533,"level":"ERROR","function":"load_model","line":464,"message":"unable to load model","model":"/build/models/llava-v1.6-mistral-7b.Q5_K_M.gguf"}
localai-docker-api-1  | [127.0.0.1]:46082 200 - GET /readyz

One error:

localai-docker-api-1  | 8:27AM DBG Loading Model DreamShaper_8_pruned.safetensors with gRPC (file: /build/models/DreamShaper_8_pruned.safetensors) (backend: diffusers): {backendString:diffusers model:DreamShaper_8_pruned.safetensors threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000245a00 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-docker-api-1  | 8:27AM DBG Loading external backend: /build/backend/python/diffusers/run.sh
localai-docker-api-1  | 8:27AM DBG Loading GRPC Process: /build/backend/python/diffusers/run.sh
localai-docker-api-1  | 8:27AM DBG GRPC Service for DreamShaper_8_pruned.safetensors will be running at: '127.0.0.1:38145'
localai-docker-api-1  | 8:27AM DBG GRPC Service state dir: /tmp/go-processmanager2561185379
localai-docker-api-1  | 8:27AM DBG GRPC Service Started
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr /opt/conda/envs/diffusers/lib/python3.11/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr   warnings.warn(
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr There was a problem when trying to move your cache:
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr   File "/opt/conda/envs/diffusers/lib/python3.11/site-packages/transformers/utils/hub.py", line 1271, in <module>
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr     move_cache()
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr   File "/opt/conda/envs/diffusers/lib/python3.11/site-packages/transformers/utils/hub.py", line 1180, in move_cache
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr     cached_files = get_all_cached_files(cache_dir=cache_dir)
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr   File "/opt/conda/envs/diffusers/lib/python3.11/site-packages/transformers/utils/hub.py", line 1079, in get_all_cached_files
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr     url = metadata["url"]
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr           ~~~~~~~~^^^^^^^
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr KeyError: 'url'
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr
localai-docker-api-1  | 8:27AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:38145): stderr Please file an issue at https://github.com/huggingface/transformers/issues/new/choose and copy paste this whole message and we will do our best to help.
localai-bot commented 4 months ago

It seems that you're experiencing an issue with the model loading process. The error message indicates that there is a problem with the DreamShaper_8_pruned.safetensors model file, specifically within the model's metadata. The KeyError suggests that the metadata file is missing the 'url' parameter.

To resolve this, you could try the following steps:

  1. Check that you have the correct file path and permissions for the DreamShaper_8_pruned.safetensors model file.
  2. Verify that the model file was downloaded and saved correctly from the Hugging Face model repository.
  3. Redownload the model from the Hugging Face model repository and save it again to ensure that you have a complete and correct file.
  4. Ensure that the environment variables (such as HF_HOME) are correctly set and point to the appropriate locations on your system.
  5. If the issue persists, please file an issue on the Hugging Face Transformers repository on GitHub, providing the details of your problem and the complete error log. They will be able to help you further diagnose and resolve the issue.
shuther commented 4 months ago

The curl script below worked. the file DreamShaper_8_pruned.safetensors is present also the curl below is working as expected:

curl http://linuxmain.local:8445/v1/images/generations \
    -H "Content-Type: application/json" -d '{
        "prompt": "A cute baby sea otter",
        "size": "256x256"
      }'
Aisuko commented 4 months ago

hi guys, For @Sarmingsteiner from the log the loading process was failed. Please try to pick up a model from https://localai.io/models/ model gallery and test if it can be loaded in your env.

For @shuther, it is an error from Huggingface transformers. We import diffusers and it depends on transformers. It is related to URL. I assume you should check the model name and it should be same to the HUggingface model card. (If your model does not list in Model Gallery)