Open allenhaozi opened 1 year ago
Deployed in k8s, the GPU has been configured, but it should not take effect
resources:
limits:
cpu: "1"
memory: 50Gi
nvidia.com/gpu: "4"
requests:
cpu: "1"
memory: 50Gi
nvidia.com/gpu: "4"
I have the same problem when running LocalAI in a Docker container. The logs contain numerous lines of the form:
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33427: connect: connection refused"
with varying port numbers
FYI: the problem occurs both in local Docker builds and the ":latest" image from go-skynet
Yes same here. used the latest version with GPT4All model and it just gives errors. Same on Kubernetes and local
If it can help, my (very similar) error message :
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "ggml-gpt4all-j",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
{"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}}
I am currently trying to compile a previous release in order to see until when LocalAI worked without this problem.
Unfortunately, the Docker build command seems to expect the source to have been checked-out as a Git project and refuses to build from an unpacked ZIP archive...
Thus, I directly checked out v1.21.0, built a Docker image locally, ran it...and had the same problem as before.
For the records: here is what I did
git clone https://github.com/go-skynet/LocalAI.git --branch v1.21.0
cd LocalAI
docker build -t localai .
docker run --rm --name localai \
-v "/path/to/your/local/models/folder":/build/models \
-p 127.0.0.1:8080:8080 \
localai
I also tried v1.20.1 and v1.20.0 - but these builds failed with "ggml.c:(.text+0x2e860): multiple definition of `clear_numa_thread_affinity'; /build/go-llama/libbinding.a(ggml.o):ggml.c:(.text+0x2e860): first defined here"
Building v1.19.2 succeeded - but it could not load my model (LLaMA 2) which makes it useless for me...
I don't have the time to check every previous version, but perhaps somebody else has...
did you tried running with REBUILD=true
? also please attach full logs with DEBUG=true
Ok, so I
docker build -t localai .
anddocker run --rm --name localai \
-v "/path/to/your/local/models/folder":/build/models \
-p 127.0.0.1:8080:8080 \
localai
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model":"llama-2-13b.ggmlv3.q4_0",
"messages": [
{"role":"system", "content":"Perform the instructions to the best of your ability.\n"},
{"role": "user", "content": "### Instruction: who was Joseph Weizenbaum?\n### Response:"}
],
"temperature": 0.0,
"max_tokens": 256,
"stream": false
}'
with the same result as before. Here are the logs (mind the "skipping rebuild")
@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
CPU: no AVX found
CPU: no AVX2 found
CPU: no AVX512 found
@@@@@
3:31AM DBG no galleries to load
3:31AM INF Starting LocalAI using 4 threads, with models path: /build/models
3:31AM INF LocalAI version: v1.22.0-19-gdde12b4 (dde12b492b2da4f14d66047a42b66bff80e223af)
┌───────────────────────────────────────────────────┐
│ Fiber v2.48.0 │
│ http://127.0.0.1:8080 │
│ (bound on host 0.0.0.0 and port 8080) │
│ │
│ Handlers ............ 31 Processes ........... 1 │
│ Prefork ....... Disabled PID ................ 14 │
└───────────────────────────────────────────────────┘
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:38673: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41499: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41677: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36709: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37449: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33973: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42041: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45563: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34759: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:42351: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:44449: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:36327: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35585: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45917: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:35687: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:33939: connect: connection refused"
here is my .env
## Set number of threads.
## Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.
THREADS=4
## Specify a different bind address (defaults to ":8080")
ADDRESS=0.0.0.0:8080
## Default models context size
CONTEXT_SIZE=4096
## Define galleries.
## models will to install will be visible in `/models/available`
#GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"name":"huggingface","url":"github:go-skynet/model-gallery/huggingface.yaml"}]
## CORS settings
CORS=true
CORS_ALLOW_ORIGINS=*
## Default path for models
# MODELS_PATH=/models
## Enable debug mode
DEBUG=true
## Specify a build type. Available: cublas, openblas, clblas.
# BUILD_TYPE=metal
## Uncomment and set to true to enable rebuilding from source
REBUILD=true
## Enable go tags, available: stablediffusion, tts
## stablediffusion: image generation with stablediffusion
## tts: enables text-to-speech with go-piper
## (requires REBUILD=true)
#
# GO_TAGS=stablediffusion
## Path where to store generated images
# IMAGE_PATH=/tmp
## Specify a default upload limit in MB (whisper)
# UPLOAD_LIMIT
here is the environment of the running container as reported by Docker (mind the "REBUILD=false")
Environment
PATH
/usr/local/cuda/bin:/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
GOLANG_VERSION
1.20.6
GOPATH
/go
BUILD_TYPE
EXTERNAL_GRPC_BACKENDS
huggingface-embeddings:/build/extra/grpc/huggingface/huggingface.py
REBUILD
false
HEALTHCHECK_ENDPOINT
http://localhost:8080/readyz
Mounts
/build/models
/Users/andreas/rozek/AI/models/meta-ai/llama2
Port
8080/tcp
127.0.0.1:8080
adding to this, same issues here both local docker & EKS via AL2 amd64
I can get through to /v1/models ok, but can't do anything with a model otherwise I get a timeout & various forms of:
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp [::1]:33427: connect: connection refused"
seems it might just be an issue with the localAI image. building from scratch in a container works ok :
` FROM public.ecr.aws/amazonlinux/amazonlinux:latest
RUN yum install git -y RUN yum install golang -y RUN yum group install "Development Tools" -y RUN yum install cmake -y
RUN git clone https://github.com/go-skynet/LocalAI.git
WORKDIR /LocalAI
RUN make build
COPY . .
EXPOSE 8080
ENTRYPOINT [ "./local-ai", "--debug", "--models-path", "./models", "" ] `
@rozek @nabbl @Mer0me I had precisely the same error message as you had, so our problems may be the same. I inspected the usage of hardware resources by docker containers, and at least in my case, it was the memory limit issue. Docker Desktop (in Ubuntu 22.04) ships with a default memory limit smaller than the size of LLM (gpt4all in my case). So I set the memory limit 10GB, large enough to have gpt4all, and then it worked.
It was difficult to figure out it was the memory limit issue because the error message does not deliver it directly. Also, I don't know well about Docker, nor about LLMs, so it took some time for me to figure out the source of the problem in my machine. I think it will definitely help to include a note about increasing Docker's memory limit enough to have LLM on memory in the getting started page: https://localai.io/basics/getting_started/index.html
Note that I also uncommented REBUILD=true
in .env file.
Also, increasing the memory of docker by including --memory
when running container did not help either. At least in my machine, I needed to increase it in the Docker Desktop application, and it seems like a common confusion (see https://stackoverflow.com/a/44533437).
@allenhaozi Also given that your debug log says it failed to load the template, I wonder if it is the issue of (1) the wrong path set to find model template, or (2) not enough memory to load template.
@allenhaozi Also given that your debug log says it failed to load the template, I wonder if it is the issue of (1) the wrong path set to find model template, or (2) not enough memory to load template.
@swoh816 , use quay.io/go-skynet/local-ai:v1.23.2-cublas-cuda11 image, got the following errors request:
{
"model": "chatglm2-6b",
"messages": [
{
"role": "user",
"content": "How are you?"
}
],
"temperature": 0.9
}
response:
{
"error": {
"code": 500,
"message": "rpc error: code = Unknown desc = unimplemented",
"type": ""
}
}
log:
4:07AM DBG Request received:
4:07AM DBG Configuration read: &{PredictionOptions:{Model:chatglm2-6b Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:1 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt: RMSNormEps:0 NGQA:0}
4:07AM DBG Parameters: &{PredictionOptions:{Model:chatglm2-6b Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:1 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt: RMSNormEps:0 NGQA:0}
4:07AM DBG Prompt (before templating): How are you?
4:07AM DBG Template failed loading: failed loading a template for chatglm2-6b
4:07AM DBG Prompt (after templating): How are you?
4:07AM DBG Model already loaded in memory: chatglm2-6b
4:07AM DBG Model 'chatglm2-6b' already loaded
[10.100.116.87]:50320 500 - POST /v1/chat/completions
I just followed the example and have the same issue here with
docker-compose version 1.29.2, build unknown
Increasing the memory as described by @swoh816 is what resolved this error for me.
Additionally, once that was fixed, text generation was extremely slow. The fix for that was to set threads equal to the number of CPU on the Kubernetes node.
i increased the memory limit to 64 G still same message. i am using the example from "getting started".
when i uncommented REBUILD=true in .env file, i got the following error
curl: (56) Recv failure: Connection reset by peer
anything else i can try?
Could someone share what hardware/system configuration this does build and run successfully in?
I followed the example and ran into the same problem here
Also getting a similar issue here.
.env
THREADS=8
CONTEXT_SIZE=4096
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]
MODELS_PATH=/models
DEBUG=true
COMPEL=0
SINGLE_ACTIVE_BACKEND=true
BUILD_TYPE=cublas
REBUILD=true
GO_TAGS=stablediffusion
IMAGE_PATH=/tmp
docker-compose.yaml
version: '3.6'
services:
api:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
image: quay.io/go-skynet/local-ai:sha-238fec2-cublas-cuda12-ffmpeg-core
tty: true # enable colorized logs
restart: always # should this be on-failure ?
ports:
- 8080:8080
env_file:
- .env
volumes:
- ./models:/models
- ./images/:/tmp/generated/images/
command: ["/usr/bin/local-ai" ]
Request & Error
…/AI/LocalAI שׂ master via 🐹 on ☁️ (us-east-1)
🕙 19:58:46 ❯❯ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama2-7b-chat-gguf",
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
{"error":{"code":500,"message":"could not load model: rpc error: code = Unknown desc = failed loading model","type":""}}⏎
Container Logs
2023-12-03 19:58:44 12:58AM ERR error processing message {SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions Role:User: RoleName:user Content:How are you? MessageIndex:0} using template "llama2-7b-chat-gguf-chat": template: prompt:3:5: executing "prompt" at <.Input>: can't evaluate field Input in type model.ChatMessageTemplateData. Skipping!
2023-12-03 19:58:44 rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:46545: connect: connection refused"
Debug
2023-12-03 20:06:12 [127.0.0.1]:39930 200 - GET /readyz
2023-12-03 20:06:52 1:06AM DBG Request received:
2023-12-03 20:06:52 1:06AM DBG Configuration read: &{PredictionOptions:{Model: Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama2-7b-chat-gguf F16:false Threads:8 Debug:true Roles:map[assistant:Assitant: assistant_function_call:Function Call: function:Function Result: system:System: user:User:] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage:llama2-7b-chat-gguf-chat Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
2023-12-03 20:06:52 1:06AM DBG Parameters: &{PredictionOptions:{Model: Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:llama2-7b-chat-gguf F16:false Threads:8 Debug:true Roles:map[assistant:Assitant: assistant_function_call:Function Call: function:Function Result: system:System: user:User:] Embeddings:false Backend:llama TemplateConfig:{Chat: ChatMessage:llama2-7b-chat-gguf-chat Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
2023-12-03 20:06:52 1:06AM ERR error processing message {SystemPrompt:You are a helpful assistant, below is a conversation, please respond with the next message and do not ask follow-up questions Role:User: RoleName:user Content:How are you? MessageIndex:0} using template "llama2-7b-chat-gguf-chat": template: prompt:3:5: executing "prompt" at <.Input>: can't evaluate field Input in type model.ChatMessageTemplateData. Skipping!
2023-12-03 20:06:52 1:06AM DBG Prompt (before templating): User:How are you?
2023-12-03 20:06:52 1:06AM DBG Template failed loading: failed loading a template for
2023-12-03 20:06:52 1:06AM DBG Prompt (after templating): User:How are you?
2023-12-03 20:06:52 1:06AM DBG Loading model llama from
2023-12-03 20:06:52 1:06AM DBG Stopping all backends except ''
2023-12-03 20:06:52 1:06AM DBG Loading model in memory from file: /models
2023-12-03 20:06:52 1:06AM DBG Loading Model with gRPC (file: /models) (backend: llama): {backendString:llama model: threads:8 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0001da5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:true parallelRequests:false}
2023-12-03 20:06:52 1:06AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
2023-12-03 20:06:52 1:06AM DBG GRPC Service for will be running at: '127.0.0.1:34533'
2023-12-03 20:06:52 1:06AM DBG GRPC Service state dir: /tmp/go-processmanager3341423294
2023-12-03 20:06:52 1:06AM DBG GRPC Service Started
2023-12-03 20:06:53 rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:34533: connect: connection refused"
2023-12-03 20:06:53 1:06AM DBG GRPC(-127.0.0.1:34533): stderr 2023/12/04 01:06:53 gRPC Server listening at 127.0.0.1:34533
2023-12-03 20:06:55 1:06AM DBG GRPC Service Ready
2023-12-03 20:06:55 1:06AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model: ContextSize:4096 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr create_gpt_params_cuda: loading model /models
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr ggml_init_cublas: found 1 CUDA devices:
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr gguf_init_from_file: invalid magic number 00000000
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr error loading model: llama_model_loader: failed to load model from /models
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr llama_load_model_from_file: failed to load model
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr llama_init_from_gpt_params: error: failed to load model '/models'
2023-12-03 20:06:55 1:06AM DBG GRPC(-127.0.0.1:34533): stderr load_binding_model: error: unable to load model
2023-12-03 20:06:55 [172.18.0.1]:54898 500 - POST /v1/chat/completions
2023-12-03 20:07:12 [127.0.0.1]:37870 200 - GET /readyz
If I change to the lunademo
model from the model-gallery (also used in the model setup how-to), I get many more errors in debug:
2023-12-03 20:12:13 [127.0.0.1]:51240 200 - GET /readyz
2023-12-03 20:12:28 1:12AM DBG Request received:
2023-12-03 20:12:28 1:12AM DBG Configuration read: &{PredictionOptions:{Model:luna-ai-llama2-uncensored.Q4_K_M.gguf Language: N:0 TopP:0 TopK:0 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:lunademo F16:false Threads:10 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat:luna-chat-message ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
2023-12-03 20:12:28 1:12AM DBG Parameters: &{PredictionOptions:{Model:luna-ai-llama2-uncensored.Q4_K_M.gguf Language: N:0 TopP:0 TopK:0 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:lunademo F16:false Threads:10 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Chat:luna-chat-message ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:4096 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
2023-12-03 20:12:28 1:12AM DBG Prompt (before templating): How are you?
2023-12-03 20:12:28 1:12AM DBG Template found, input modified to: How are you?
2023-12-03 20:12:28
2023-12-03 20:12:28 ASSISTANT:
2023-12-03 20:12:28
2023-12-03 20:12:28 1:12AM DBG Prompt (after templating): How are you?
2023-12-03 20:12:28
2023-12-03 20:12:28 ASSISTANT:
2023-12-03 20:12:28
2023-12-03 20:12:28 1:12AM DBG Loading model llama from luna-ai-llama2-uncensored.Q4_K_M.gguf
2023-12-03 20:12:28 1:12AM DBG Stopping all backends except 'luna-ai-llama2-uncensored.Q4_K_M.gguf'
2023-12-03 20:12:28 1:12AM DBG [single-backend] Stopping
2023-12-03 20:12:28 1:12AM DBG Loading model in memory from file: /models/luna-ai-llama2-uncensored.Q4_K_M.gguf
2023-12-03 20:12:28 1:12AM DBG Loading Model luna-ai-llama2-uncensored.Q4_K_M.gguf with gRPC (file: /models/luna-ai-llama2-uncensored.Q4_K_M.gguf) (backend: llama): {backendString:llama model:luna-ai-llama2-uncensored.Q4_K_M.gguf threads:10 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0001da5a0 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:true parallelRequests:false}
2023-12-03 20:12:28 1:12AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama
2023-12-03 20:12:28 1:12AM DBG GRPC Service for luna-ai-llama2-uncensored.Q4_K_M.gguf will be running at: '127.0.0.1:45223'
2023-12-03 20:12:28 1:12AM DBG GRPC Service state dir: /tmp/go-processmanager3150385545
2023-12-03 20:12:28 1:12AM DBG GRPC Service Started
2023-12-03 20:12:28 rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:45223: connect: connection refused"
2023-12-03 20:12:28 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr 2023/12/04 01:12:28 gRPC Server listening at 127.0.0.1:45223
2023-12-03 20:12:30 1:12AM DBG GRPC Service Ready
2023-12-03 20:12:30 1:12AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:luna-ai-llama2-uncensored.Q4_K_M.gguf ContextSize:4096 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:10 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/luna-ai-llama2-uncensored.Q4_K_M.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr create_gpt_params_cuda: loading model /models/luna-ai-llama2-uncensored.Q4_K_M.gguf
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr ggml_init_cublas: found 1 CUDA devices:
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /models/luna-ai-llama2-uncensored.Q4_K_M.gguf (version GGUF V2 (latest))
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 0: token_embd.weight q4_K [ 4096, 32000, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 1: blk.0.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 2: blk.0.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 3: blk.0.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 4: blk.0.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 5: blk.0.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 6: blk.0.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 7: blk.0.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 8: blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 9: blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 10: blk.1.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 11: blk.1.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 12: blk.1.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 13: blk.1.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 14: blk.1.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 15: blk.1.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 16: blk.1.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 17: blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 18: blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 19: blk.2.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 20: blk.2.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 21: blk.2.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 22: blk.2.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 23: blk.2.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 24: blk.2.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 25: blk.2.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 26: blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 27: blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 28: blk.3.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 29: blk.3.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 30: blk.3.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 31: blk.3.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 32: blk.3.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 33: blk.3.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 34: blk.3.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 35: blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 36: blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 37: blk.4.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 38: blk.4.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 39: blk.4.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 40: blk.4.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 41: blk.4.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 42: blk.4.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 43: blk.4.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 44: blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 45: blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 46: blk.5.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 47: blk.5.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 48: blk.5.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 49: blk.5.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 50: blk.5.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 51: blk.5.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 52: blk.5.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 53: blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 54: blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 55: blk.6.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 56: blk.6.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 57: blk.6.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 58: blk.6.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 59: blk.6.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 60: blk.6.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 61: blk.6.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 62: blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 63: blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 64: blk.7.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 65: blk.7.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 66: blk.7.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 67: blk.7.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 68: blk.7.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 69: blk.7.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 70: blk.7.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 71: blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 72: blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 73: blk.8.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 74: blk.8.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 75: blk.8.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 76: blk.8.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 77: blk.8.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 78: blk.8.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 79: blk.8.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 80: blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 81: blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 82: blk.9.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 83: blk.9.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 84: blk.9.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 85: blk.9.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 86: blk.9.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 87: blk.9.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 88: blk.9.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 89: blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 90: blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 91: blk.10.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 92: blk.10.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 93: blk.10.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 94: blk.10.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 95: blk.10.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 96: blk.10.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 97: blk.10.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 98: blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 99: blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 100: blk.11.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 101: blk.11.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 102: blk.11.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 103: blk.11.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 104: blk.11.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 105: blk.11.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 106: blk.11.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 107: blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 108: blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 109: blk.12.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 110: blk.12.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 111: blk.12.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 112: blk.12.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 113: blk.12.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 114: blk.12.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 115: blk.12.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 116: blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 117: blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 118: blk.13.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 119: blk.13.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 120: blk.13.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 121: blk.13.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 122: blk.13.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 123: blk.13.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 124: blk.13.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 125: blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 126: blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 127: blk.14.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 128: blk.14.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 129: blk.14.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 130: blk.14.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 131: blk.14.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 132: blk.14.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 133: blk.14.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 134: blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 135: blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 136: blk.15.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 137: blk.15.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 138: blk.15.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 139: blk.15.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 140: blk.15.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 141: blk.15.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 142: blk.15.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 143: blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 144: blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 145: blk.16.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 146: blk.16.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 147: blk.16.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 148: blk.16.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 149: blk.16.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 150: blk.16.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 151: blk.16.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 152: blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 153: blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 154: blk.17.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 155: blk.17.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 156: blk.17.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 157: blk.17.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 158: blk.17.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 159: blk.17.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 160: blk.17.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 161: blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 162: blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 163: blk.18.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 164: blk.18.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 165: blk.18.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 166: blk.18.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 167: blk.18.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 168: blk.18.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 169: blk.18.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 170: blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 171: blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 172: blk.19.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 173: blk.19.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 174: blk.19.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 175: blk.19.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 176: blk.19.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 177: blk.19.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 178: blk.19.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 179: blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 180: blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 181: blk.20.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 182: blk.20.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 183: blk.20.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 184: blk.20.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 185: blk.20.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 186: blk.20.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 187: blk.20.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 188: blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 189: blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 190: blk.21.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 191: blk.21.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 192: blk.21.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 193: blk.21.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 194: blk.21.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 195: blk.21.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 196: blk.21.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 197: blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 198: blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 199: blk.22.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 200: blk.22.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 201: blk.22.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 202: blk.22.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 203: blk.22.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 204: blk.22.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 205: blk.22.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 206: blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 207: blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 208: blk.23.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 209: blk.23.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 210: blk.23.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 211: blk.23.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 212: blk.23.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 213: blk.23.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 214: blk.23.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 215: blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 216: blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 217: blk.24.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 218: blk.24.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 219: blk.24.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 220: blk.24.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 221: blk.24.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 222: blk.24.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 223: blk.24.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 224: blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 225: blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 226: blk.25.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 227: blk.25.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 228: blk.25.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 229: blk.25.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 230: blk.25.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 231: blk.25.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 232: blk.25.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 233: blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 234: blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 235: blk.26.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 236: blk.26.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 237: blk.26.attn_v.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 238: blk.26.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 239: blk.26.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 240: blk.26.ffn_down.weight q4_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 241: blk.26.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 242: blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 243: blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 244: blk.27.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 245: blk.27.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 246: blk.27.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 247: blk.27.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 248: blk.27.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 249: blk.27.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 250: blk.27.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 251: blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 252: blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 253: blk.28.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 254: blk.28.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 255: blk.28.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 256: blk.28.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 257: blk.28.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 258: blk.28.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 259: blk.28.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 260: blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 261: blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 262: blk.29.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 263: blk.29.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 264: blk.29.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 265: blk.29.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 266: blk.29.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 267: blk.29.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 268: blk.29.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 269: blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 270: blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 271: blk.30.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 272: blk.30.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 273: blk.30.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 274: blk.30.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 275: blk.30.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 276: blk.30.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 277: blk.30.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 278: blk.30.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 279: blk.30.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 280: blk.31.attn_q.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 281: blk.31.attn_k.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 282: blk.31.attn_v.weight q6_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 283: blk.31.attn_output.weight q4_K [ 4096, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 284: blk.31.ffn_gate.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 285: blk.31.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 286: blk.31.ffn_up.weight q4_K [ 4096, 11008, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 287: blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 288: blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 289: output_norm.weight f32 [ 4096, 1, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - tensor 290: output.weight q6_K [ 4096, 32000, 1, 1 ]
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 0: general.architecture str
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 1: general.name str
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 2: llama.context_length u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 3: llama.embedding_length u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 4: llama.block_count u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 5: llama.feed_forward_length u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 6: llama.rope.dimension_count u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 7: llama.attention.head_count u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 8: llama.attention.head_count_kv u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 10: general.file_type u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 11: tokenizer.ggml.model str
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 12: tokenizer.ggml.tokens arr
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 13: tokenizer.ggml.scores arr
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 14: tokenizer.ggml.token_type arr
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - kv 18: general.quantization_version u32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - type f32: 65 tensors
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - type q4_K: 193 tensors
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_model_loader: - type q6_K: 33 tensors
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: format = GGUF V2 (latest)
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: arch = llama
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: vocab type = SPM
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_vocab = 32000
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_merges = 0
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_ctx_train = 2048
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_ctx = 4096
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_embd = 4096
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_head = 32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_head_kv = 32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_layer = 32
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_rot = 128
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_gqa = 1
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: f_norm_eps = 0.0e+00
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: f_norm_rms_eps = 1.0e-05
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: n_ff = 11008
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: freq_base = 10000.0
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: freq_scale = 1
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: model type = 7B
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: model ftype = mostly Q4_K - Medium
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: model params = 6.74 B
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: model size = 3.80 GiB (4.84 BPW)
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: general.name = tap-m_luna-ai-llama2-uncensored
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: BOS token = 1 '<s>'
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: EOS token = 2 '</s>'
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: UNK token = 0 '<unk>'
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: PAD token = 0 '<unk>'
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_print_meta: LF token = 13 '<0x0A>'
2023-12-03 20:12:30 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: ggml ctx size = 3891.34 MB
2023-12-03 20:12:32 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: using CUDA for GPU acceleration
2023-12-03 20:12:32 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: mem required = 3891.34 MB (+ 4096.00 MB per state)
2023-12-03 20:12:32 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: offloading 0 repeating layers to GPU
2023-12-03 20:12:32 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: offloaded 0/35 layers to GPU
2023-12-03 20:12:32 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llm_load_tensors: VRAM used: 0 MB
2023-12-03 20:12:35 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr ..................................................................................................
2023-12-03 20:12:37 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_new_context_with_model: kv self size = 4096.00 MB
2023-12-03 20:12:37 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_new_context_with_model: compute buffer total size = 281.47 MB
2023-12-03 20:12:37 1:12AM DBG GRPC(luna-ai-llama2-uncensored.Q4_K_M.gguf-127.0.0.1:45223): stderr llama_new_context_with_model: VRAM scratch buffer: 280.00 MB
2023-12-03 20:13:13 [127.0.0.1]:34806 200 - GET /readyz
Is there any solution/workaround? I get the same error with various models when deployed to EKS (locally I can run it fine on minikube).
Another frustrated user here; I can't get anything to work, including the 'Getting started' instructions. Trying a cublas build with Docker. I get the feeling the Local.AI architecture is failing to surface errors from the back-end that would tell the problem. Requested #1416
same issue
In case it helps, I was facing similar errors trying to host llama2 model on AWS EKS with A10 gpu. First we upgraded Nvidia to latest 5.* driver. Second, I needed to also deploy a model yaml file to set f16/gpu_layers (it wasn't enough just to have those as env params as the helm chart pushes). The LocalAI API methods below help you do it all easily -- can search for a model via their gallery & push it with the settings you want (here you can also specify the backend, so it doesn't guess):
curl http://localhost:8000/models/available | jq '.[] | select(.name | contains("llama2"))'
curl http://localhost:8000/models/apply -H "Content-Type: application/json" -d '{
"id": "thebloke__llama2-chat-ayt-13b-gguf__llama2-chat-ayt-13b.q5_k_s.gguf",
"overrides": {
"backend": "llama",
"f16": true,
"gpu_layers": 43
}
}'
@mudler
https://github.com/mudler/LocalAI/blob/3851b51d98ee6dce4e05aa6b045e53917b39f267/backend/cpp/llama/grpc-server.cpp#L2284-L2369
1 not unimplemented grpc::Status Embedding(ServerContext context, const backend::PredictOptions request, backend::EmbeddingResult* reply) method.
if backend =llama-cpp:
{ "error": { "code": 500, "message": "rpc error: code = Unknown desc = unimplemented", "type": "" } }
Bert.cpp has been integrated into llama.cpp! See https://github.com/ggerganov/llama.cpp/pull/5423 and the discussions
Updated forks: iamlemec/bert.cpp xyzhang626/embeddings.cpp
2 backend/go/llm/llama not used
What went wrong? Settings?
quay.io/go-skynet/local-ai:master-cublas-cuda11
request:
response:
log: