mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.2k stars 1.76k forks source link

Could not load model: SIGILL: illegal instruction #1447

Closed Taronyuu closed 9 months ago

Taronyuu commented 9 months ago

LocalAI version:

quay.io/go-skynet/local-ai:master-cublas-cuda12-core

Environment, CPU architecture, OS, and Version:

Linux user-Z68X-UD3P-B3 6.2.0-39-generic #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

version: '3.6'

services:
  api:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    image: quay.io/go-skynet/local-ai:master-cublas-cuda12-core
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
      - ./images/:/tmp/generated/images/
    command: ["/usr/bin/local-ai" ]
nvidia-smi
Fri Dec 15 22:13:12 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:01:00.0 Off |                  N/A |
|  0%   43C    P8              25W / 350W |      3MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Describe the bug Every .gguf model that I try fails with the error as seen below. I've downloaded TheBloke's CodeLLama-13b (gguf) and it failed, I've tried out the 7B LLama model, the Luna model (as shown in the docs) and now Tinyllama and they all fail. I know that the Cuda integration with Docker is working as expect because I ran the Nvidia sample workload and Axolotl for training all fine inside Docker.

Furthermore, if I remove the backend altogether then LocalAI will try every backend, however, none of them work.

To Reproduce

Execute this curl, but every model will fail.

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9
   }'
user@user-Z68X-UD3P-B3:~/LocalAI/models$ pwd
/home/user/LocalAI/models
user@user-Z68X-UD3P-B3:~/LocalAI/models$ ls -al
total 4388344
drwxrwxr-x  2 user user       4096 Dec 15 21:29 .
drwxrwxr-x 17 user user       4096 Dec 15 21:29 ..
-rw-r--r--  1 root   root          253 Dec 15 21:23 tinyllama.yaml
-rw-r--r--  1 root   root    667822976 Dec 15 21:23 tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf
cat models/tinyllama.yaml
context_size: 1024
name: thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf
parameters:
  model: tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf
  temperature: 0.2
  top_k: 80
  top_p: 0.7
template:
  chat: chat
  completion: completion
backend: llama
f16: true
gpu_layers: 30

I've also tried llama-stable as backend, but that didn't help.

Expected behavior I would expect that the model would return a response, or at the very least show an reasonable error. I don't think the error shown is directly related to LocalAI)

Logs

docker compose up
[+] Running 22/22
 ✔ api 21 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿]      0B/0B      Pulled                                                                                              756.3s
   ✔ d1da99c2f148 Already exists                                                                                                                               0.0s
   ✔ 577ff23cfe55 Already exists                                                                                                                               0.0s
   ✔ c7b1e60e9d5a Already exists                                                                                                                               0.0s
   ✔ 714cd879eb99 Already exists                                                                                                                               0.0s
   ✔ 2bd8b252ec0a Pull complete                                                                                                                               28.0s
   ✔ 6ef0790763b3 Pull complete                                                                                                                                0.7s
   ✔ 44bdf02e4a01 Pull complete                                                                                                                               40.7s
   ✔ 77491a53669e Pull complete                                                                                                                                1.9s
   ✔ 05ae0f4a5fe4 Pull complete                                                                                                                                3.4s
   ✔ 4f4fb700ef54 Pull complete                                                                                                                                4.1s
   ✔ 14601617e69c Pull complete                                                                                                                              632.7s
   ✔ 6e3a4bd4a7f0 Pull complete                                                                                                                              154.3s
   ✔ 63661a91fb39 Pull complete                                                                                                                               42.1s
   ✔ c414c2c4015d Pull complete                                                                                                                               43.5s
   ✔ ffae41ac74b5 Pull complete                                                                                                                               46.2s
   ✔ 7bbc1461a8b5 Pull complete                                                                                                                              603.3s
   ✔ 5801e1ec273c Pull complete                                                                                                                              354.6s
   ✔ 30952fbd13a3 Pull complete                                                                                                                              511.2s
   ✔ 8f06b863e302 Pull complete                                                                                                                              582.3s
   ✔ 5b07b6742079 Pull complete                                                                                                                              588.2s
   ✔ ea25b4a47834 Pull complete                                                                                                                              594.2s
WARN[0756] Found orphan containers ([localai-local-ai-1]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[+] Running 1/1
 ✔ Container localai-api-1  Recreated                                                                                                                          7.0s
Attaching to localai-api-1
localai-api-1  | @@@@@
localai-api-1  | Skipping rebuild
localai-api-1  | @@@@@
localai-api-1  | If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
localai-api-1  | If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
localai-api-1  | CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
localai-api-1  | see the documentation at: https://localai.io/basics/build/index.html
localai-api-1  | Note: See also https://github.com/go-skynet/LocalAI/issues/288
localai-api-1  | @@@@@
localai-api-1  | CPU info:
localai-api-1  | model name : Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
localai-api-1  | flags      : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
localai-api-1  | CPU:    AVX    found OK
localai-api-1  | CPU: no AVX2   found
localai-api-1  | CPU: no AVX512 found
localai-api-1  | @@@@@
localai-api-1  | 9:03PM INF Starting LocalAI using 2 threads, with models path: /models
localai-api-1  | 9:03PM INF LocalAI version: fb6a5bc (fb6a5bc620cc39657e03ef958b09230acdf977a0)
localai-api-1  | 9:03PM DBG Model: thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf (config: {PredictionOptions:{Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf F16:true Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat:chat ChatMessage: Completion:completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:30 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false})
localai-api-1  | 9:03PM DBG Extracting backend assets files to /tmp/localai/backend_data
localai-api-1  |
localai-api-1  |  ┌───────────────────────────────────────────────────┐
localai-api-1  |  │                   Fiber v2.50.0                   │
localai-api-1  |  │               http://127.0.0.1:8080               │
localai-api-1  |  │       (bound on host 0.0.0.0 and port 8080)       │
localai-api-1  |  │                                                   │
localai-api-1  |  │ Handlers ............ 74  Processes ........... 1 │
localai-api-1  |  │ Prefork ....... Disabled  PID ................ 14 │
localai-api-1  |  └───────────────────────────────────────────────────┘
localai-api-1  |
localai-api-1  | 9:04PM DBG Request received:
localai-api-1  | 9:04PM DBG Configuration read: &{PredictionOptions:{Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf F16:true Threads:2 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat:chat ChatMessage: Completion:completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:30 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false}
localai-api-1  | 9:04PM DBG Parameters: &{PredictionOptions:{Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf F16:true Threads:2 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat:chat ChatMessage: Completion:completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:30 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false}
localai-api-1  | 9:04PM DBG Prompt (before templating): How are you?
localai-api-1  | 9:04PM DBG Template found, input modified to: Below is an instruction that describes a task. Write a response that appropriately completes the request.
localai-api-1  |
localai-api-1  | ### Instruction:
localai-api-1  | How are you?
localai-api-1  |
localai-api-1  | ### Response:
localai-api-1  | 9:04PM DBG Prompt (after templating): Below is an instruction that describes a task. Write a response that appropriately completes the request.
localai-api-1  |
localai-api-1  | ### Instruction:
localai-api-1  | How are you?
localai-api-1  |
localai-api-1  | ### Response:
localai-api-1  | 9:04PM DBG Loading model llama from tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf
localai-api-1  | 9:04PM DBG Loading model in memory from file: /models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf
localai-api-1  | 9:04PM DBG Loading Model tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf with gRPC (file: /models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf) (backend: llama): {backendString:llama model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf threads:2 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004281e0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-api-1  | 9:04PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
localai-api-1  | 9:04PM DBG GRPC Service for tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf will be running at: '127.0.0.1:42591'
localai-api-1  | 9:04PM DBG GRPC Service state dir: /tmp/go-processmanager721992787
localai-api-1  | 9:04PM DBG GRPC Service Started
localai-api-1  | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42591: connect: connection refused"
localai-api-1  | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42591: connect: connection refused"
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr 2023/12/15 21:04:11 gRPC Server listening at 127.0.0.1:42591
localai-api-1  | 9:04PM DBG GRPC Service Ready
localai-api-1  | 9:04PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf ContextSize:1024 Seed:0 NBatch:512 F16Memory:true MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:30 MainGPU: TensorSplit: Threads:2 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr SIGILL: illegal instruction
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr PC=0x8a06bc m=5 sigcode=2
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr signal arrived during cgo execution
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr instruction bytes: 0xc4 0xe3 0x7d 0x39 0x8c 0x24 0x18 0x3 0x0 0x0 0x1 0x66 0x89 0x84 0x24 0x0
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr goroutine 50 [syscall]:
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.cgocall(0x823240, 0xc0000f54d8)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0000f54b0 sp=0xc0000f5478 pc=0x41960b
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x7fa350000cd0, 0x400, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x1e, 0x200, ...)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  _cgo_gotypes.go:267 +0x4f fp=0xc0000f54d8 sp=0xc0000f54b0 pc=0x815b2f
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr github.com/go-skynet/go-llama%2ecpp.New({0xc00002c0c0, 0x2c}, {0xc00010bd00, 0x9, 0x938460?})
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /build/sources/go-llama/llama.go:39 +0x385 fp=0xc0000f56e8 sp=0xc0000f54d8 pc=0x816525
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr main.(*LLM).Load(0xc0000a4618, 0xc00012ed20)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /build/backend/go/llm/llama/llama.go:87 +0xc9c fp=0xc0000f5900 sp=0xc0000f56e8 pc=0x82049c
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr github.com/go-skynet/LocalAI/pkg/grpc.(*server).LoadModel(0xc000098d50, {0xc00012ed20?, 0x50c886?}, 0x0?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /build/pkg/grpc/server.go:50 +0xe6 fp=0xc0000f59b0 sp=0xc0000f5900 pc=0x81dce6
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr github.com/go-skynet/LocalAI/pkg/grpc/proto._Backend_LoadModel_Handler({0x9a9900?, 0xc000098d50}, {0xa90570, 0xc000024cc0}, 0xc00010a380, 0x0)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /build/pkg/grpc/proto/backend_grpc.pb.go:264 +0x169 fp=0xc0000f5a08 sp=0xc0000f59b0 pc=0x80afa9
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001d61e0, {0xa90570, 0xc00028e120}, {0xa93a98, 0xc000007ba0}, 0xc0002a0000, 0xc0001dec90, 0x11895b0, 0x0)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1343 +0xe03 fp=0xc0000f5df0 sp=0xc0000f5a08 pc=0x7f3f23
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc.(*Server).handleStream(0xc0001d61e0, {0xa93a98, 0xc000007ba0}, 0xc0002a0000)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1737 +0xc4c fp=0xc0000f5f78 sp=0xc0000f5df0 pc=0x7f8e8c
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc.(*Server).serveStreams.func1.1()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:986 +0x86 fp=0xc0000f5fe0 sp=0xc0000f5f78 pc=0x7f1e26
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goexit()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000f5fe8 sp=0xc0000f5fe0 pc=0x47c961
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 7
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:997 +0x145
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr goroutine 1 [IO wait]:
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.gopark(0x42b828?, 0x7fa36032a8f8?, 0x78?, 0x9b?, 0x4e8e3d?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0001c9b08 sp=0xc0001c9ae8 pc=0x44ddae
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.netpollblock(0xc0001c9b98?, 0x418da6?, 0x0?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc0001c9b40 sp=0xc0001c9b08 pc=0x446857
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr internal/poll.runtime_pollWait(0x7fa3603b1e58, 0x72)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc0001c9b60 sp=0xc0001c9b40 pc=0x477885
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr internal/poll.(*pollDesc).wait(0xc00019a600?, 0x0?, 0x0)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0001c9b88 sp=0xc0001c9b60 pc=0x4e1aa7
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr internal/poll.(*pollDesc).waitRead(...)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr internal/poll.(*FD).Accept(0xc00019a600)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc0001c9c30 sp=0xc0001c9b88 pc=0x4e6f8c
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr net.(*netFD).accept(0xc00019a600)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc0001c9ce8 sp=0xc0001c9c30 pc=0x642969
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr net.(*TCPListener).accept(0xc0000da4a0)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/net/tcpsock_posix.go:152 +0x1e fp=0xc0001c9d10 sp=0xc0001c9ce8 pc=0x65993e
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr net.(*TCPListener).Accept(0xc0000da4a0)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/net/tcpsock.go:315 +0x30 fp=0xc0001c9d40 sp=0xc0001c9d10 pc=0x658af0
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc.(*Server).Serve(0xc0001d61e0, {0xa8fb80?, 0xc0000da4a0})
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:852 +0x462 fp=0xc0001c9e80 sp=0xc0001c9d40 pc=0x7f0a82
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr github.com/go-skynet/LocalAI/pkg/grpc.StartServer({0x7ffe142a4aa5?, 0xc00009c130?}, {0xa941c0?, 0xc0000a4618})
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /build/pkg/grpc/server.go:178 +0x17d fp=0xc0001c9f10 sp=0xc0001c9e80 pc=0x81f6dd
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr main.main()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /build/backend/go/llm/llama/main.go:20 +0x85 fp=0xc0001c9f40 sp=0xc0001c9f10 pc=0x822a45
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.main()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc0001c9fe0 sp=0xc0001c9f40 pc=0x44d95b
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goexit()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0001c9fe8 sp=0xc0001c9fe0 pc=0x47c961
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr goroutine 2 [force gc (idle)]:
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00004cfa8 sp=0xc00004cf88 pc=0x44ddae
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goparkunlock(...)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:404
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.forcegchelper()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc00004cfe0 sp=0xc00004cfa8 pc=0x44dc33
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goexit()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00004cfe8 sp=0xc00004cfe0 pc=0x47c961
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr created by runtime.init.6 in goroutine 1
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:310 +0x1a
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr goroutine 3 [GC sweep wait]:
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00004d778 sp=0xc00004d758 pc=0x44ddae
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goparkunlock(...)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:404
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.bgsweep(0x0?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/mgcsweep.go:280 +0x94 fp=0xc00004d7c8 sp=0xc00004d778 pc=0x439cd4
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.gcenable.func1()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc00004d7e0 sp=0xc00004d7c8 pc=0x42ee85
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goexit()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00004d7e8 sp=0xc00004d7e0 pc=0x47c961
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr created by runtime.gcenable in goroutine 1
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/mgc.go:200 +0x66
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr goroutine 4 [GC scavenge wait]:
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.gopark(0xc000034070?, 0xa88d58?, 0x1?, 0x0?, 0xc0000071e0?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00004df70 sp=0xc00004df50 pc=0x44ddae
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goparkunlock(...)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:404
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.(*scavengerState).park(0x11d2aa0)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00004dfa0 sp=0xc00004df70 pc=0x4375a9
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.bgscavenge(0x0?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc00004dfc8 sp=0xc00004dfa0 pc=0x437b3c
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.gcenable.func2()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc00004dfe0 sp=0xc00004dfc8 pc=0x42ee25
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goexit()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00004dfe8 sp=0xc00004dfe0 pc=0x47c961
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr created by runtime.gcenable in goroutine 1
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/mgc.go:201 +0xa5
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr goroutine 18 [finalizer wait]:
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.gopark(0x198?, 0x9d3860?, 0x1?, 0xef?, 0x0?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00004c620 sp=0xc00004c600 pc=0x44ddae
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.runfinq()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00004c7e0 sp=0xc00004c620 pc=0x42dea7
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goexit()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00004c7e8 sp=0xc00004c7e0 pc=0x47c961
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr created by runtime.createfing in goroutine 1
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/mfinal.go:163 +0x3d
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr goroutine 5 [select]:
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.gopark(0xc000165f00?, 0x2?, 0x1e?, 0x0?, 0xc000165ed4?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000165d80 sp=0xc000165d60 pc=0x44ddae
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.selectgo(0xc000165f00, 0xc000165ed0, 0x78b1f6?, 0x0, 0xc000150000?, 0x1)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000165ea0 sp=0xc000165d80 pc=0x45d805
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc000100550, 0x1)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:418 +0x113 fp=0xc000165f30 sp=0xc000165ea0 pc=0x76a053
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc0001401c0)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:552 +0x86 fp=0xc000165f90 sp=0xc000165f30 pc=0x76a766
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func2()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:336 +0xd5 fp=0xc000165fe0 sp=0xc000165f90 pc=0x780fb5
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goexit()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000165fe8 sp=0xc000165fe0 pc=0x47c961
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 34
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:333 +0x1acc
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr goroutine 6 [select]:
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.gopark(0xc000048770?, 0x4?, 0x0?, 0x69?, 0xc0000486c0?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000048528 sp=0xc000048508 pc=0x44ddae
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.selectgo(0xc000048770, 0xc0000486b8, 0xf?, 0x0, 0xc000048690?, 0x1)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000048648 sp=0xc000048528 pc=0x45d805
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc/internal/transport.(*http2Server).keepalive(0xc000007ba0)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:1152 +0x225 fp=0xc0000487c8 sp=0xc000048648 pc=0x788265
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc/internal/transport.NewServerTransport.func4()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:339 +0x25 fp=0xc0000487e0 sp=0xc0000487c8 pc=0x780ea5
localai-api-1  | [172.19.0.1]:55268 500 - POST /v1/chat/completions
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goexit()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000487e8 sp=0xc0000487e0 pc=0x47c961
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 34
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:339 +0x1b0e
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr goroutine 7 [IO wait]:
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.gopark(0x11eac00?, 0xb?, 0x0?, 0x0?, 0x6?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000061aa0 sp=0xc000061a80 pc=0x44ddae
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.netpollblock(0x4c6d18?, 0x418da6?, 0x0?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000061ad8 sp=0xc000061aa0 pc=0x446857
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr internal/poll.runtime_pollWait(0x7fa3603b1d60, 0x72)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000061af8 sp=0xc000061ad8 pc=0x477885
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr internal/poll.(*pollDesc).wait(0xc000316000?, 0xc000148000?, 0x0)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000061b20 sp=0xc000061af8 pc=0x4e1aa7
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr internal/poll.(*pollDesc).waitRead(...)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr internal/poll.(*FD).Read(0xc000316000, {0xc000148000, 0x8000, 0x8000})
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000061bb8 sp=0xc000061b20 pc=0x4e2d9a
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr net.(*netFD).Read(0xc000316000, {0xc000148000?, 0x1060100000000?, 0x8?})
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000061c00 sp=0xc000061bb8 pc=0x640945
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr net.(*conn).Read(0xc000318000, {0xc000148000?, 0xc000061c90?, 0x3?})
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000061c48 sp=0xc000061c00 pc=0x651065
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr net.(*TCPConn).Read(0x0?, {0xc000148000?, 0xc000061ca0?, 0x46bcad?})
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  <autogenerated>:1 +0x25 fp=0xc000061c78 sp=0xc000061c48 pc=0x663805
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr bufio.(*Reader).Read(0xc0000767e0, {0xc000158040, 0x9, 0xc1574db2f3113641?})
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/bufio/bufio.go:244 +0x197 fp=0xc000061cb0 sp=0xc000061c78 pc=0x5bbed7
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr io.ReadAtLeast({0xa8d5e0, 0xc0000767e0}, {0xc000158040, 0x9, 0x9}, 0x9)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/io/io.go:335 +0x90 fp=0xc000061cf8 sp=0xc000061cb0 pc=0x4c0ed0
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr io.ReadFull(...)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/io/io.go:354
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr golang.org/x/net/http2.readFrameHeader({0xc000158040, 0x9, 0xc000288030?}, {0xa8d5e0?, 0xc0000767e0?})
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/golang.org/x/net@v0.17.0/http2/frame.go:237 +0x65 fp=0xc000061d48 sp=0xc000061cf8 pc=0x756ac5
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr golang.org/x/net/http2.(*Framer).ReadFrame(0xc000158000)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/golang.org/x/net@v0.17.0/http2/frame.go:498 +0x85 fp=0xc000061df0 sp=0xc000061d48 pc=0x757205
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc000007ba0, 0x1?)
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:636 +0x145 fp=0xc000061f00 sp=0xc000061df0 pc=0x784105
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc.(*Server).serveStreams(0xc0001d61e0, {0xa93a98?, 0xc000007ba0})
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:979 +0x1c2 fp=0xc000061f80 sp=0xc000061f00 pc=0x7f1bc2
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr google.golang.org/grpc.(*Server).handleRawConn.func1()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:920 +0x45 fp=0xc000061fe0 sp=0xc000061f80 pc=0x7f1425
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr runtime.goexit()
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000061fe8 sp=0xc000061fe0 pc=0x47c961
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 34
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr  /go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:919 +0x185
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr rax    0x0
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr rbx    0xab7900
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr rcx    0x7fa35bff51a0
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr rdx    0x7fa3d2c616d8
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr rdi    0x7fa3d2c616c8
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr rsi    0x7fa3d2c59e38
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr rbp    0x7fa35bff52c0
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr rsp    0x7fa35bff4f40
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr r8     0x0
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr r9     0x7fa350000080
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr r10    0xfffffffffffffd8c
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr r11    0x7fa3d2b64990
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr r12    0x1
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr r13    0x7fa35bff5060
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr r14    0x7fa35bff4ff0
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr r15    0x7fa35bff5160
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr rip    0x8a06bc
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr rflags 0x10246
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr cs     0x33
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr fs     0x0
localai-api-1  | 9:04PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:42591): stderr gs     0x0
localai-api-1  | [127.0.0.1]:40982 200 - GET /readyz
localai-api-1  | [127.0.0.1]:53278 200 - GET /readyz

Additional context

Taronyuu commented 9 months ago

So far I have not been able to have any model output text, but I am at a point where nvidia-smi shows GPU utilisation. I've spent days on this, and now only mere hours after creating this issue I can reply to it myself :-)

So, long story short, there are 3 things that I ran into.

  1. First of all was my CPU being an older CPU (i5 2500k), therefore I had to rebuild with the additional flags as seen here: https://localai.io/basics/build/ (CPU flagset compatibility)
  2. Second, I had to rebuild the Cuda container too with the added flags above.
  3. As said in this comment (https://github.com/mudler/LocalAI/issues/840#issuecomment-1764327199) I also had to add NVIDIA_VISIBLE_DEVICES: ALL to the environment.

This is my current docker compose file:

version: '3.8'

services:
  local-ai:
    image: quay.io/go-skynet/local-ai:master-cublas-cuda12-core
    ports:
      - "8080:8080"
    environment:
      DEBUG: "true"
      MODELS_PATH: "/models"
      THREADS: "1"
      NVIDIA_VISIBLE_DEVICES: "all"
      REBUILD: "true"
      CMAKE_ARGS: "-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_AVX=OFF -DLLAMA_FMA=OFF"
    volumes:
      - $PWD/models:/models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
              count: all

Now, my GPU is being used but no outputs are generated yet. See logs below. I'll look into it more tomororw, but if anyone has any idea, please let me know!

nvidia-smi
Sat Dec 16 01:20:59 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:01:00.0 Off |                  N/A |
| 53%   53C    P2             224W / 350W |   1171MiB / 24576MiB |     61%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    134394      C   ..._data/backend-assets/grpc/llama-cpp     1162MiB |
+---------------------------------------------------------------------------------------+
docker compose up
WARN[0000] Found orphan containers ([localai-api-1]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[+] Running 1/0
 ✔ Container localai-local-ai-1  Created                                                                                                                       0.0s
Attaching to localai-local-ai-1
localai-local-ai-1  | touch get-sources
localai-local-ai-1  | go mod edit -replace github.com/nomic-ai/gpt4all/gpt4all-bindings/golang=/build/sources/gpt4all/gpt4all-bindings/golang
localai-local-ai-1  | go mod edit -replace github.com/go-skynet/go-ggml-transformers.cpp=/build/sources/go-ggml-transformers
localai-local-ai-1  | go mod edit -replace github.com/donomii/go-rwkv.cpp=/build/sources/go-rwkv
localai-local-ai-1  | go mod edit -replace github.com/ggerganov/whisper.cpp=/build/sources/whisper.cpp
localai-local-ai-1  | go mod edit -replace github.com/ggerganov/whisper.cpp/bindings/go=/build/sources/whisper.cpp/bindings/go
localai-local-ai-1  | go mod edit -replace github.com/go-skynet/go-bert.cpp=/build/sources/go-bert
localai-local-ai-1  | go mod edit -replace github.com/mudler/go-stable-diffusion=/build/sources/go-stable-diffusion
localai-local-ai-1  | go mod edit -replace github.com/mudler/go-piper=/build/sources/go-piper
localai-local-ai-1  | go mod download
localai-local-ai-1  | touch prepare-sources
localai-local-ai-1  | touch prepare
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/langchain-huggingface ./backend/go/llm/langchain/
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-ggml-transformers LIBRARY_PATH=/build/sources/go-ggml-transformers \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/falcon-ggml ./backend/go/llm/falcon-ggml/
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-bert LIBRARY_PATH=/build/sources/go-bert \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/bert-embeddings ./backend/go/llm/bert/
localai-local-ai-1  | go mod edit -replace github.com/go-skynet/go-llama.cpp=/build/sources/go-llama
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-llama LIBRARY_PATH=/build/sources/go-llama \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/llama ./backend/go/llm/llama/
localai-local-ai-1  | # github.com/go-skynet/go-llama.cpp
localai-local-ai-1  | binding.cpp: In function 'void llama_binding_free_model(void*)':
localai-local-ai-1  | binding.cpp:809:5: warning: possible problem detected in invocation of 'operator delete' [-Wdelete-incomplete]
localai-local-ai-1  |   809 |     delete ctx->model;
localai-local-ai-1  |       |     ^~~~~~~~~~~~~~~~~
localai-local-ai-1  | binding.cpp:809:17: warning: invalid use of incomplete type 'struct llama_model'
localai-local-ai-1  |   809 |     delete ctx->model;
localai-local-ai-1  |       |            ~~~~~^~~~~
localai-local-ai-1  | In file included from sources/go-llama/llama.cpp/common/common.h:5,
localai-local-ai-1  |                  from binding.cpp:1:
localai-local-ai-1  | sources/go-llama/llama.cpp/llama.h:60:12: note: forward declaration of 'struct llama_model'
localai-local-ai-1  |    60 |     struct llama_model;
localai-local-ai-1  |       |            ^~~~~~~~~~~
localai-local-ai-1  | binding.cpp:809:5: note: neither the destructor nor the class-specific 'operator delete' will be called, even if they are declared when the class is defined
localai-local-ai-1  |   809 |     delete ctx->model;
localai-local-ai-1  |       |     ^~~~~~~~~~~~~~~~~
localai-local-ai-1  | cp -rfv backend/cpp/llama/grpc-server backend-assets/grpc/llama-cpp
localai-local-ai-1  | 'backend/cpp/llama/grpc-server' -> 'backend-assets/grpc/llama-cpp'
localai-local-ai-1  | go mod edit -replace github.com/go-skynet/go-llama.cpp=/build/sources/go-llama-ggml
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-llama-ggml LIBRARY_PATH=/build/sources/go-llama-ggml \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/llama-ggml ./backend/go/llm/llama-ggml/
localai-local-ai-1  | # github.com/go-skynet/go-llama.cpp
localai-local-ai-1  | binding.cpp: In function 'void llama_binding_free_model(void*)':
localai-local-ai-1  | binding.cpp:613:5: warning: possible problem detected in invocation of 'operator delete' [-Wdelete-incomplete]
localai-local-ai-1  |   613 |     delete ctx->model;
localai-local-ai-1  |       |     ^~~~~~~~~~~~~~~~~
localai-local-ai-1  | binding.cpp:613:17: warning: invalid use of incomplete type 'struct llama_model'
localai-local-ai-1  |   613 |     delete ctx->model;
localai-local-ai-1  |       |            ~~~~~^~~~~
localai-local-ai-1  | In file included from sources/go-llama-ggml/llama.cpp/examples/common.h:5,
localai-local-ai-1  |                  from binding.cpp:1:
localai-local-ai-1  | sources/go-llama-ggml/llama.cpp/llama.h:70:12: note: forward declaration of 'struct llama_model'
localai-local-ai-1  |    70 |     struct llama_model;
localai-local-ai-1  |       |            ^~~~~~~~~~~
localai-local-ai-1  | binding.cpp:613:5: note: neither the destructor nor the class-specific 'operator delete' will be called, even if they are declared when the class is defined
localai-local-ai-1  |   613 |     delete ctx->model;
localai-local-ai-1  |       |     ^~~~~~~~~~~~~~~~~
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/gpt4all/gpt4all-bindings/golang/ LIBRARY_PATH=/build/sources/gpt4all/gpt4all-bindings/golang/ \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/gpt4all ./backend/go/llm/gpt4all/
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-ggml-transformers LIBRARY_PATH=/build/sources/go-ggml-transformers \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/dolly ./backend/go/llm/dolly/
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-ggml-transformers LIBRARY_PATH=/build/sources/go-ggml-transformers \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/gpt2 ./backend/go/llm/gpt2/
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-ggml-transformers LIBRARY_PATH=/build/sources/go-ggml-transformers \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/gptj ./backend/go/llm/gptj/
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-ggml-transformers LIBRARY_PATH=/build/sources/go-ggml-transformers \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/gptneox ./backend/go/llm/gptneox/
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-ggml-transformers LIBRARY_PATH=/build/sources/go-ggml-transformers \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/mpt ./backend/go/llm/mpt/
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-ggml-transformers LIBRARY_PATH=/build/sources/go-ggml-transformers \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/replit ./backend/go/llm/replit/
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-ggml-transformers LIBRARY_PATH=/build/sources/go-ggml-transformers \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/starcoder ./backend/go/llm/starcoder/
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" C_INCLUDE_PATH=/build/sources/go-rwkv LIBRARY_PATH=/build/sources/go-rwkv \
localai-local-ai-1  | go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o backend-assets/grpc/rwkv ./backend/go/llm/rwkv
localai-local-ai-1  | I local-ai build info:
localai-local-ai-1  | I BUILD_TYPE: cublas
localai-local-ai-1  | I GO_TAGS:
localai-local-ai-1  | I LD_FLAGS: -X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"
localai-local-ai-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o local-ai ./
localai-local-ai-1  | 11:52PM INF Starting LocalAI using 1 threads, with models path: /models
localai-local-ai-1  | 11:52PM INF LocalAI version: fb6a5bc (fb6a5bc620cc39657e03ef958b09230acdf977a0)
localai-local-ai-1  | 11:52PM DBG Model: thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf (config: {PredictionOptions:{Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf F16:true Threads:0 Debug:false Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:chat ChatMessage: Completion:completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:9 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false})
localai-local-ai-1  | 11:52PM DBG Extracting backend assets files to /tmp/localai/backend_data
localai-local-ai-1  |
localai-local-ai-1  |  ┌───────────────────────────────────────────────────┐
localai-local-ai-1  |  │                   Fiber v2.50.0                   │
localai-local-ai-1  |  │               http://127.0.0.1:8080               │
localai-local-ai-1  |  │       (bound on host 0.0.0.0 and port 8080)       │
localai-local-ai-1  |  │                                                   │
localai-local-ai-1  |  │ Handlers ............ 74  Processes ........... 1 │
localai-local-ai-1  |  │ Prefork ....... Disabled  PID .............. 1092 │
localai-local-ai-1  |  └───────────────────────────────────────────────────┘
localai-local-ai-1  |
localai-local-ai-1  | 11:52PM DBG Request received:
localai-local-ai-1  | 11:52PM DBG `input`: &{PredictionOptions:{Model:thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf Language: N:0 TopP:0 TopK:0 Temperature:0.7 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Context:context.Background.WithCancel Cancel:0x4a99c0 File: ResponseFormat:{Type:} Size: Prompt:A long time ago in a galaxy far, far away Instruction: Input:<nil> Stop:<nil> Messages:[] Functions:[] FunctionCall:<nil> Stream:false Mode:0 Step:0 Grammar: JSONFunctionGrammarObject:<nil> Backend: ModelBaseName:}
localai-local-ai-1  | 11:52PM DBG Parameter Config: &{PredictionOptions:{Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.7 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf F16:true Threads:1 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:chat ChatMessage: Completion:completion Edit: Functions:} PromptStrings:[A long time ago in a galaxy far, far away] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:9 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false}
localai-local-ai-1  | 11:52PM DBG Template found, input modified to: A long time ago in a galaxy far, far away
localai-local-ai-1  |
localai-local-ai-1  | 11:52PM DBG Loading model 'tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf' greedly from all the available backends: llama-cpp, llama-ggml, llama, gpt4all, gptneox, bert-embeddings, falcon-ggml, gptj, gpt2, dolly, mpt, replit, starcoder, rwkv, whisper, stablediffusion, piper, /build/backend/python/transformers/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/bark/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/vall-e-x/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/exllama/run.sh, /build/backend/python/petals/run.sh
localai-local-ai-1  | 11:52PM DBG [llama-cpp] Attempting to load
localai-local-ai-1  | 11:52PM DBG Loading model llama-cpp from tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf
localai-local-ai-1  | 11:52PM DBG Loading model in memory from file: /models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf
localai-local-ai-1  | 11:52PM DBG Loading Model tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf with gRPC (file: /models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf) (backend: llama-cpp): {backendString:llama-cpp model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf threads:1 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc00025e5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-local-ai-1  | 11:52PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
localai-local-ai-1  | 11:52PM DBG GRPC Service for tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf will be running at: '127.0.0.1:34953'
localai-local-ai-1  | 11:52PM DBG GRPC Service state dir: /tmp/go-processmanager2515077067
localai-local-ai-1  | 11:52PM DBG GRPC Service Started
localai-local-ai-1  | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34953: connect: connection refused"
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stdout Server listening on 127.0.0.1:34953
localai-local-ai-1  | 11:52PM DBG GRPC Service Ready
localai-local-ai-1  | 11:52PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf ContextSize:1024 Seed:0 NBatch:512 F16Memory:true MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:9 MainGPU: TensorSplit: Threads:1 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr ggml_init_cublas: found 1 CUDA devices:
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: loaded meta data with 20 key-value pairs and 201 tensors from /models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf (version GGUF V2)
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  2048, 32003,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor    1:              blk.0.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor    2:              blk.0.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor    3:              blk.0.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor    4:         blk.0.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor    6:              blk.0.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor    7:            blk.0.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor    8:           blk.0.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor    9:            blk.0.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   10:              blk.1.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   11:              blk.1.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   12:              blk.1.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   13:         blk.1.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   14:            blk.1.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   15:              blk.1.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   16:            blk.1.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   17:           blk.1.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   18:            blk.1.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   19:              blk.2.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   20:              blk.2.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   21:              blk.2.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   22:         blk.2.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   23:            blk.2.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   24:              blk.2.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   25:            blk.2.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   26:           blk.2.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   27:            blk.2.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   28:              blk.3.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   29:              blk.3.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   30:              blk.3.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   31:         blk.3.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   32:            blk.3.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   33:              blk.3.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   34:            blk.3.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   35:           blk.3.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   36:            blk.3.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   37:              blk.4.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   38:              blk.4.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   39:              blk.4.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   40:         blk.4.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   41:            blk.4.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   42:              blk.4.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   43:            blk.4.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   44:           blk.4.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   45:            blk.4.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   46:              blk.5.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   47:              blk.5.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   48:              blk.5.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   49:         blk.5.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   50:            blk.5.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   51:              blk.5.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   52:            blk.5.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   53:           blk.5.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   54:            blk.5.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   55:              blk.6.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   56:              blk.6.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   57:              blk.6.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   58:         blk.6.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   59:            blk.6.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   60:              blk.6.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   61:            blk.6.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   62:           blk.6.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   63:            blk.6.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   64:              blk.7.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   65:              blk.7.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   66:              blk.7.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   67:         blk.7.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   68:            blk.7.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   69:              blk.7.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   70:            blk.7.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   71:           blk.7.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   72:            blk.7.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   73:              blk.8.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   74:              blk.8.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   75:              blk.8.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   76:         blk.8.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   77:            blk.8.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   78:              blk.8.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   79:            blk.8.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   80:           blk.8.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   81:            blk.8.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   82:              blk.9.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   83:              blk.9.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   84:              blk.9.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   85:         blk.9.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   86:            blk.9.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   87:              blk.9.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   88:            blk.9.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   89:           blk.9.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   90:            blk.9.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   91:             blk.10.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   92:             blk.10.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   93:             blk.10.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   94:        blk.10.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   95:           blk.10.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   96:             blk.10.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   97:           blk.10.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   98:          blk.10.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor   99:           blk.10.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  100:             blk.11.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  101:             blk.11.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  102:             blk.11.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  103:        blk.11.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  104:           blk.11.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  105:             blk.11.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  106:           blk.11.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  107:          blk.11.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  108:           blk.11.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  109:             blk.12.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  110:             blk.12.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  111:             blk.12.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  112:        blk.12.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  113:           blk.12.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  114:             blk.12.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  115:           blk.12.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  116:          blk.12.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  117:           blk.12.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  118:             blk.13.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  119:             blk.13.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  120:             blk.13.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  121:        blk.13.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  122:           blk.13.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  123:             blk.13.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  124:           blk.13.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  125:          blk.13.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  126:           blk.13.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  127:             blk.14.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  128:             blk.14.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  129:             blk.14.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  130:        blk.14.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  131:           blk.14.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  132:             blk.14.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  133:           blk.14.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  134:          blk.14.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  135:           blk.14.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  136:             blk.15.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  137:             blk.15.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  138:             blk.15.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  139:        blk.15.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  140:           blk.15.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  141:             blk.15.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  142:           blk.15.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  143:          blk.15.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  144:           blk.15.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  145:             blk.16.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  146:             blk.16.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  147:             blk.16.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  148:        blk.16.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  149:           blk.16.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  150:             blk.16.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  151:           blk.16.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  152:          blk.16.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  153:           blk.16.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  154:             blk.17.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  155:             blk.17.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  156:             blk.17.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  157:        blk.17.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  158:           blk.17.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  159:             blk.17.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  160:           blk.17.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  161:          blk.17.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  162:           blk.17.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  163:             blk.18.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  164:             blk.18.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  165:             blk.18.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  166:        blk.18.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  167:           blk.18.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  168:             blk.18.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  169:           blk.18.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  170:          blk.18.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  171:           blk.18.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  172:             blk.19.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  173:             blk.19.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  174:             blk.19.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  175:        blk.19.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  176:           blk.19.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  177:             blk.19.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  178:           blk.19.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  179:          blk.19.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  180:           blk.19.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  181:             blk.20.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  182:             blk.20.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  183:             blk.20.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  184:        blk.20.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  185:           blk.20.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  186:             blk.20.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  187:           blk.20.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  188:          blk.20.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  189:           blk.20.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  190:             blk.21.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  191:             blk.21.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  192:             blk.21.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  193:        blk.21.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  194:           blk.21.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  195:             blk.21.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  196:           blk.21.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  197:          blk.21.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  198:           blk.21.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  199:               output_norm.weight f32      [  2048,     1,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - tensor  200:                    output.weight q6_K     [  2048, 32003,     1,     1 ]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv   0:                       general.architecture str              = llama
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv   1:                               general.name str              = py007_tinyllama-1.1b-chat-v0.3
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv   4:                          llama.block_count u32              = 22
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv  11:                          general.file_type u32              = 15
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32003]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32003]   = [0.000000, 0.000000, 0.000000, 0.0000...
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32003]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - kv  19:               general.quantization_version u32              = 2
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - type  f32:   45 tensors
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - type q4_K:  135 tensors
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_model_loader: - type q6_K:   21 tensors
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_vocab: special tokens definition check successful ( 262/32003 ).
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: format           = GGUF V2
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: arch             = llama
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: vocab type       = SPM
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_vocab          = 32003
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_merges         = 0
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_ctx_train      = 2048
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_embd           = 2048
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_head           = 32
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_head_kv        = 4
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_layer          = 22
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_rot            = 64
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_gqa            = 8
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: f_norm_eps       = 0.0e+00
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: f_clamp_kqv      = 0.0e+00
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: f_max_alibi_bias = 0.0e+00
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_ff             = 5632
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: rope scaling     = linear
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: freq_base_train  = 10000.0
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: freq_scale_train = 1
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: n_yarn_orig_ctx  = 2048
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: rope_finetuned   = unknown
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: model type       = ?B
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: model ftype      = mostly Q4_K - Medium
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: model params     = 1.10 B
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: model size       = 636.18 MiB (4.85 BPW)
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: general.name     = py007_tinyllama-1.1b-chat-v0.3
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: BOS token        = 1 '<s>'
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: EOS token        = 2 '</s>'
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: UNK token        = 0 '<unk>'
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_print_meta: LF token         = 13 '<0x0A>'
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_tensors: ggml ctx size =  636.26 MiB
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_tensors: using CUDA for GPU acceleration
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_tensors: mem required  =  408.67 MiB
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_tensors: offloading 9 repeating layers to GPU
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_tensors: offloaded 9/23 layers to GPU
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llm_load_tensors: VRAM used: 227.59 MiB
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr ......................................................................................
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_new_context_with_model: n_ctx      = 1024
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_new_context_with_model: freq_base  = 10000.0
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_new_context_with_model: freq_scale = 1
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_kv_cache_init: VRAM kv self = 9.00 MB
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_new_context_with_model: KV self size  =   22.00 MiB, K (f16):   11.00 MiB, V (f16):   11.00 MiB
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_build_graph: non-view tensors processed: 466/466
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_new_context_with_model: compute buffer total size = 81.07 MiB
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_new_context_with_model: VRAM scratch buffer: 78.00 MiB
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr llama_new_context_with_model: total VRAM used: 314.59 MiB (model: 227.59 MiB, context: 87.00 MiB)
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr Available slots:
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr  -> Slot 0 - max context: 1024
localai-local-ai-1  | 11:52PM DBG [llama-cpp] Loads OK
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr slot 0 is processing [task id: 0]
localai-local-ai-1  | 11:52PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr slot 0 : kv cache rm - [0, end)
localai-local-ai-1  | [127.0.0.1]:36472 200 - GET /readyz
localai-local-ai-1  | [127.0.0.1]:36128 200 - GET /readyz
localai-local-ai-1  | 11:54PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-local-ai-1  | [127.0.0.1]:41876 200 - GET /readyz
localai-local-ai-1  | 11:55PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-local-ai-1  | [127.0.0.1]:54262 200 - GET /readyz
localai-local-ai-1  | 11:56PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-local-ai-1  | [127.0.0.1]:33584 200 - GET /readyz
localai-local-ai-1  | 11:57PM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:34953): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
wuxxin commented 9 months ago

try backend: llama-cpp, as this is the most up2date one

Taronyuu commented 9 months ago

try backend: llama-cpp, as this is the most up2date one

I've tried llama-cpp, however, the same issue still persists. It is being (off)loaded into my GPU. But no response is given, even after several minutes. I did some researching (e.g. https://github.com/mudler/LocalAI/issues/1281) and I think it is simply because my model is looping for whatever reason. I've got a bit older CPU but only 1 GPU, but maybe that is still related. I'll try to use a 13b model to see how that goes.

docker compose up
WARN[0000] Found orphan containers ([localai-local-ai-1]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[+] Running 1/0
 ✔ Container localai-api-1  Created                                                                                                                            0.0s
Attaching to localai-api-1
localai-api-1  | go mod edit -replace github.com/nomic-ai/gpt4all/gpt4all-bindings/golang=/build/sources/gpt4all/gpt4all-bindings/golang
localai-api-1  | go mod edit -replace github.com/go-skynet/go-ggml-transformers.cpp=/build/sources/go-ggml-transformers
localai-api-1  | go mod edit -replace github.com/donomii/go-rwkv.cpp=/build/sources/go-rwkv
localai-api-1  | go mod edit -replace github.com/ggerganov/whisper.cpp=/build/sources/whisper.cpp
localai-api-1  | go mod edit -replace github.com/ggerganov/whisper.cpp/bindings/go=/build/sources/whisper.cpp/bindings/go
localai-api-1  | go mod edit -replace github.com/go-skynet/go-bert.cpp=/build/sources/go-bert
localai-api-1  | go mod edit -replace github.com/mudler/go-stable-diffusion=/build/sources/go-stable-diffusion
localai-api-1  | go mod edit -replace github.com/mudler/go-piper=/build/sources/go-piper
localai-api-1  | go mod download
localai-api-1  | touch prepare-sources
localai-api-1  | touch prepare
localai-api-1  | I local-ai build info:
localai-api-1  | I BUILD_TYPE: cublas
localai-api-1  | I GO_TAGS:
localai-api-1  | I LD_FLAGS: -X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"
localai-api-1  | CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=fb6a5bc" -X "github.com/go-skynet/LocalAI/internal.Commit=fb6a5bc620cc39657e03ef958b09230acdf977a0"" -tags "" -o local-ai ./
localai-api-1  | 7:56AM INF Starting LocalAI using 3 threads, with models path: /models
localai-api-1  | 7:56AM INF LocalAI version: fb6a5bc (fb6a5bc620cc39657e03ef958b09230acdf977a0)
localai-api-1  | 7:56AM DBG Model: thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf (config: {PredictionOptions:{Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf F16:true Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama-cpp TemplateConfig:{Chat:chat ChatMessage: Completion:completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:50 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false})
localai-api-1  | 7:56AM DBG Extracting backend assets files to /tmp/localai/backend_data
localai-api-1  |
localai-api-1  |  ┌───────────────────────────────────────────────────┐
localai-api-1  |  │                   Fiber v2.50.0                   │
localai-api-1  |  │               http://127.0.0.1:8080               │
localai-api-1  |  │       (bound on host 0.0.0.0 and port 8080)       │
localai-api-1  |  │                                                   │
localai-api-1  |  │ Handlers ............ 74  Processes ........... 1 │
localai-api-1  |  │ Prefork ....... Disabled  PID ............... 136 │
localai-api-1  |  └───────────────────────────────────────────────────┘
localai-api-1  |
localai-api-1  | [172.19.0.1]:50422 200 - GET /v1/models
localai-api-1  | 7:57AM DBG Request received:
localai-api-1  | 7:57AM DBG Configuration read: &{PredictionOptions:{Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf F16:true Threads:3 Debug:true Roles:map[] Embeddings:false Backend:llama-cpp TemplateConfig:{Chat:chat ChatMessage: Completion:completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:50 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false}
localai-api-1  | 7:57AM DBG Parameters: &{PredictionOptions:{Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:thebloke__tinyllama-1.1b-chat-v0.3-gguf__tinyllama-1.1b-chat-v0.3.q4_k_m.gguf F16:true Threads:3 Debug:true Roles:map[] Embeddings:false Backend:llama-cpp TemplateConfig:{Chat:chat ChatMessage: Completion:completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:50 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false}
localai-api-1  | 7:57AM DBG Prompt (before templating): How are you?
localai-api-1  | 7:57AM DBG Template found, input modified to: Below is an instruction that describes a task. Write a response that appropriately completes the request.
localai-api-1  |
localai-api-1  | ### Instruction:
localai-api-1  | How are you?
localai-api-1  |
localai-api-1  | ### Response:
localai-api-1  | 7:57AM DBG Prompt (after templating): Below is an instruction that describes a task. Write a response that appropriately completes the request.
localai-api-1  |
localai-api-1  | ### Instruction:
localai-api-1  | How are you?
localai-api-1  |
localai-api-1  | ### Response:
localai-api-1  | 7:57AM DBG Loading model llama-cpp from tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf
localai-api-1  | 7:57AM DBG Loading model in memory from file: /models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf
localai-api-1  | 7:57AM DBG Loading Model tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf with gRPC (file: /models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf) (backend: llama-cpp): {backendString:llama-cpp model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf threads:3 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0004901e0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
localai-api-1  | 7:57AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
localai-api-1  | 7:57AM DBG GRPC Service for tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf will be running at: '127.0.0.1:37427'
localai-api-1  | 7:57AM DBG GRPC Service state dir: /tmp/go-processmanager3167540258
localai-api-1  | 7:57AM DBG GRPC Service Started
localai-api-1  | rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37427: connect: connection refused"
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stdout Server listening on 127.0.0.1:37427
localai-api-1  | 7:57AM DBG GRPC Service Ready
localai-api-1  | 7:57AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf ContextSize:1024 Seed:0 NBatch:512 F16Memory:true MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:50 MainGPU: TensorSplit: Threads:3 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr ggml_init_cublas: found 1 CUDA devices:
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr   Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: loaded meta data with 20 key-value pairs and 201 tensors from /models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf (version GGUF V2)
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  2048, 32003,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor    1:              blk.0.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor    2:              blk.0.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor    3:              blk.0.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor    4:         blk.0.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor    6:              blk.0.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor    7:            blk.0.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor    8:           blk.0.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor    9:            blk.0.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   10:              blk.1.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   11:              blk.1.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   12:              blk.1.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   13:         blk.1.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   14:            blk.1.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   15:              blk.1.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   16:            blk.1.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   17:           blk.1.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   18:            blk.1.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   19:              blk.2.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   20:              blk.2.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   21:              blk.2.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   22:         blk.2.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   23:            blk.2.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   24:              blk.2.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   25:            blk.2.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   26:           blk.2.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   27:            blk.2.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   28:              blk.3.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   29:              blk.3.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   30:              blk.3.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   31:         blk.3.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   32:            blk.3.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   33:              blk.3.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   34:            blk.3.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   35:           blk.3.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   36:            blk.3.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   37:              blk.4.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   38:              blk.4.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   39:              blk.4.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   40:         blk.4.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   41:            blk.4.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   42:              blk.4.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   43:            blk.4.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   44:           blk.4.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   45:            blk.4.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   46:              blk.5.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   47:              blk.5.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   48:              blk.5.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   49:         blk.5.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   50:            blk.5.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   51:              blk.5.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   52:            blk.5.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   53:           blk.5.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   54:            blk.5.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   55:              blk.6.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   56:              blk.6.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   57:              blk.6.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   58:         blk.6.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   59:            blk.6.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   60:              blk.6.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   61:            blk.6.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   62:           blk.6.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   63:            blk.6.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   64:              blk.7.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   65:              blk.7.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   66:              blk.7.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   67:         blk.7.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   68:            blk.7.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   69:              blk.7.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   70:            blk.7.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   71:           blk.7.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   72:            blk.7.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   73:              blk.8.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   74:              blk.8.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   75:              blk.8.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   76:         blk.8.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   77:            blk.8.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   78:              blk.8.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   79:            blk.8.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   80:           blk.8.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   81:            blk.8.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   82:              blk.9.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   83:              blk.9.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   84:              blk.9.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   85:         blk.9.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   86:            blk.9.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   87:              blk.9.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   88:            blk.9.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   89:           blk.9.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   90:            blk.9.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   91:             blk.10.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   92:             blk.10.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   93:             blk.10.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   94:        blk.10.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   95:           blk.10.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   96:             blk.10.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   97:           blk.10.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   98:          blk.10.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor   99:           blk.10.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  100:             blk.11.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  101:             blk.11.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  102:             blk.11.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  103:        blk.11.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  104:           blk.11.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  105:             blk.11.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  106:           blk.11.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  107:          blk.11.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  108:           blk.11.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  109:             blk.12.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  110:             blk.12.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  111:             blk.12.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  112:        blk.12.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  113:           blk.12.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  114:             blk.12.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  115:           blk.12.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  116:          blk.12.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  117:           blk.12.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  118:             blk.13.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  119:             blk.13.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  120:             blk.13.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  121:        blk.13.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  122:           blk.13.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  123:             blk.13.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  124:           blk.13.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  125:          blk.13.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  126:           blk.13.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  127:             blk.14.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  128:             blk.14.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  129:             blk.14.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  130:        blk.14.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  131:           blk.14.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  132:             blk.14.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  133:           blk.14.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  134:          blk.14.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  135:           blk.14.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  136:             blk.15.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  137:             blk.15.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  138:             blk.15.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  139:        blk.15.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  140:           blk.15.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  141:             blk.15.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  142:           blk.15.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  143:          blk.15.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  144:           blk.15.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  145:             blk.16.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  146:             blk.16.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  147:             blk.16.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  148:        blk.16.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  149:           blk.16.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  150:             blk.16.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  151:           blk.16.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  152:          blk.16.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  153:           blk.16.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  154:             blk.17.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  155:             blk.17.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  156:             blk.17.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  157:        blk.17.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  158:           blk.17.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  159:             blk.17.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  160:           blk.17.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  161:          blk.17.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  162:           blk.17.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  163:             blk.18.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  164:             blk.18.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  165:             blk.18.attn_v.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  166:        blk.18.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  167:           blk.18.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  168:             blk.18.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  169:           blk.18.ffn_down.weight q4_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  170:          blk.18.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  171:           blk.18.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  172:             blk.19.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  173:             blk.19.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  174:             blk.19.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  175:        blk.19.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  176:           blk.19.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  177:             blk.19.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  178:           blk.19.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  179:          blk.19.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  180:           blk.19.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  181:             blk.20.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  182:             blk.20.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  183:             blk.20.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  184:        blk.20.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  185:           blk.20.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  186:             blk.20.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  187:           blk.20.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  188:          blk.20.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  189:           blk.20.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  190:             blk.21.attn_q.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  191:             blk.21.attn_k.weight q4_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  192:             blk.21.attn_v.weight q6_K     [  2048,   256,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  193:        blk.21.attn_output.weight q4_K     [  2048,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  194:           blk.21.ffn_gate.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  195:             blk.21.ffn_up.weight q4_K     [  2048,  5632,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  196:           blk.21.ffn_down.weight q6_K     [  5632,  2048,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  197:          blk.21.attn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  198:           blk.21.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  199:               output_norm.weight f32      [  2048,     1,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - tensor  200:                    output.weight q6_K     [  2048, 32003,     1,     1 ]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv   0:                       general.architecture str              = llama
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv   1:                               general.name str              = py007_tinyllama-1.1b-chat-v0.3
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv   4:                          llama.block_count u32              = 22
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv  11:                          general.file_type u32              = 15
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32003]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32003]   = [0.000000, 0.000000, 0.000000, 0.0000...
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32003]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - kv  19:               general.quantization_version u32              = 2
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - type  f32:   45 tensors
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - type q4_K:  135 tensors
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_model_loader: - type q6_K:   21 tensors
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_vocab: special tokens definition check successful ( 262/32003 ).
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: format           = GGUF V2
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: arch             = llama
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: vocab type       = SPM
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_vocab          = 32003
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_merges         = 0
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_ctx_train      = 2048
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_embd           = 2048
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_head           = 32
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_head_kv        = 4
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_layer          = 22
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_rot            = 64
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_gqa            = 8
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: f_norm_eps       = 0.0e+00
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: f_clamp_kqv      = 0.0e+00
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: f_max_alibi_bias = 0.0e+00
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_ff             = 5632
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: rope scaling     = linear
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: freq_base_train  = 10000.0
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: freq_scale_train = 1
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: n_yarn_orig_ctx  = 2048
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: rope_finetuned   = unknown
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: model type       = ?B
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: model ftype      = mostly Q4_K - Medium
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: model params     = 1.10 B
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: model size       = 636.18 MiB (4.85 BPW)
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: general.name     = py007_tinyllama-1.1b-chat-v0.3
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: BOS token        = 1 '<s>'
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: EOS token        = 2 '</s>'
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: UNK token        = 0 '<unk>'
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_print_meta: LF token         = 13 '<0x0A>'
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_tensors: ggml ctx size =  636.26 MiB
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_tensors: using CUDA for GPU acceleration
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_tensors: mem required  =   35.23 MiB
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_tensors: offloading 22 repeating layers to GPU
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_tensors: offloading non-repeating layers to GPU
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_tensors: offloaded 23/23 layers to GPU
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llm_load_tensors: VRAM used: 601.02 MiB
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr ......................................................................................
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_new_context_with_model: n_ctx      = 1024
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_new_context_with_model: freq_base  = 10000.0
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_new_context_with_model: freq_scale = 1
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_kv_cache_init: VRAM kv self = 22.00 MB
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_new_context_with_model: KV self size  =   22.00 MiB, K (f16):   11.00 MiB, V (f16):   11.00 MiB
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_build_graph: non-view tensors processed: 466/466
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_new_context_with_model: compute buffer total size = 81.07 MiB
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_new_context_with_model: VRAM scratch buffer: 78.00 MiB
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr llama_new_context_with_model: total VRAM used: 701.03 MiB (model: 601.02 MiB, context: 100.00 MiB)
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr Available slots:
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr  -> Slot 0 - max context: 1024
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0 is processing [task id: 0]
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0 : kv cache rm - [0, end)
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | [127.0.0.1]:60232 200 - GET /readyz
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:57AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | [127.0.0.1]:53946 200 - GET /readyz
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:58AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | [127.0.0.1]:44568 200 - GET /readyz
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 7:59AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | [127.0.0.1]:47844 200 - GET /readyz
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
localai-api-1  | 8:00AM DBG GRPC(tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf-127.0.0.1:37427): stderr slot 0: context shift - n_keep = 0, n_left = 1022, n_discard = 511
mudler commented 9 months ago

You are facing #1333 - there is no solution for now, few models are triggering this behavior. I'd suggest you to change model until gets fixed upstream (https://github.com/ggerganov/llama.cpp/issues/3969).

Taronyuu commented 9 months ago

You are facing #1333 - there is no solution for now, few models that are triggering this behavior. I'd suggest you to change model until gets fixed upstream (ggerganov/llama.cpp#3969).

I figured that out from another issue indeed, I just downloaded a 13b model and that works as expected. Not sure if it is just the model or the size, but 13b is my sweet spot anyway.

Apologies for creating an issue unrelated to LocalAI, but I appreciate the support of everyone. I'll close this ticket now knowing that it is an upstream issue. Thank you! 🙏🏻

gerroon commented 8 months ago

So Which models are working? There are many models out there, I am afraid wasting gigabytes trying to find the right model.

JackBekket commented 7 months ago

So Which models are working? There are many models out there, I am afraid wasting gigabytes trying to find the right model.

I just tried wizard uncensored llm with 13b & 30b of a new format (gguf)

30b version is not working and this brought me here. 13b works fine

wget https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGUF/blob/main/Wizard-Vicuna-13B-Uncensored.Q4_K_M.gguf