Issue with LocalAI Compilation: Wrong Tensor Shape

maxiannunziata commented 1 year ago

LocalAI version:

v1.23.0

Environment, CPU architecture, OS, and Version:

Linux cocopilot 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug

I'm currently trying to compile LocalAI with the model from this link: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin. However, when I run it, I receive the following error:

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024

Based on what I've found online, it appears that passing the -gqa 8 parameter could solve this issue.

I am a bit confused on where and how to set this option for LocalAI. Is the -gqa 8 parameter something that needs to be set during the model generation phase or can it be configured when running LocalAI?

Any guidance or advice would be greatly appreciated.

Logs

GRPC Service Ready 2:11AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:/mnt/hulk/models/llama-2-70b-chat.ggmlv3.q4_0.bin ContextSize:15000 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:14 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0} 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /home/anz/LocalAI/go-rwkv/rwkv.cpp/rwkv.cpp:250: header.magic == 0x67676d66 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr Invalid file header 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /home/anz/LocalAI/go-rwkv/rwkv.cpp/rwkv.cpp:1132: rwkv_fread_file_header(file.file, model.header) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /home/anz/LocalAI/go-rwkv/rwkv.cpp/rwkv.cpp:1266: rwkv_instance_from_file(file_path, *instance.get()) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr panic: runtime error: invalid memory address or nil pointer dereference 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x524cf4] 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr goroutine 10 [running]: 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr github.com/donomii/go-rwkv%2ecpp.(*Context).GetStateBufferElementCount.func1(0xc00007bfd0?) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /home/anz/LocalAI/go-rwkv/wrapper.go:63 +0x14 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr github.com/donomii/go-rwkv%2ecpp.(*Context).GetStateBufferElementCount(0xc0000ba6c0?) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /home/anz/LocalAI/go-rwkv/wrapper.go:63 +0x19 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr github.com/donomii/go-rwkv%2ecpp.LoadFiles({0xc0000ba6c0?, 0xc0000ba6d1?}, {0xc0000ba700, 0x40}, 0x6e?) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /home/anz/LocalAI/go-rwkv/wrapper.go:131 +0x5d 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr github.com/go-skynet/LocalAI/pkg/grpc/llm/rwkv.(*LLM).Load(0xc0000142a0, 0xc0001b4750) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /home/anz/LocalAI/pkg/grpc/llm/rwkv/rwkv.go:25 +0xcf 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr github.com/go-skynet/LocalAI/pkg/grpc.(*server).LoadModel(0x912200?, {0xc0001b4750?, 0x5cb8c6?}, 0x0?) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /home/anz/LocalAI/pkg/grpc/server.go:42 +0x28 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr github.com/go-skynet/LocalAI/pkg/grpc/proto._Backend_LoadModel_Handler({0x8f1300?, 0xc00007bd20}, {0x9d4af0, 0xc0001aef90}, 0xc0000f9dc0, 0x0) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /home/anz/LocalAI/pkg/grpc/proto/backend_grpc.pb.go:236 +0x170 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001a61e0, {0x9d7778, 0xc000326000}, 0xc0000e1680, 0xc0001aea20, 0xc8a530, 0x0) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /root/go/pkg/mod/google.golang.org/grpc@v1.57.0/server.go:1360 +0xe23 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr google.golang.org/grpc.(*Server).handleStream(0xc0001a61e0, {0x9d7778, 0xc000326000}, 0xc0000e1680, 0x0) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /root/go/pkg/mod/google.golang.org/grpc@v1.57.0/server.go:1737 +0xa36 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr google.golang.org/grpc.(*Server).serveStreams.func1.1() 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /root/go/pkg/mod/google.golang.org/grpc@v1.57.0/server.go:982 +0x98 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr created by google.golang.org/grpc.(*Server).serveStreams.func1 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:38615): stderr /root/go/pkg/mod/google.golang.org/grpc@v1.57.0/server.go:980 +0x18c 2:11AM DBG [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF 2:11AM DBG [whisper] Attempting to load 2:11AM DBG Loading model whisper from llama-2-70b-chat.ggmlv3.q4_0.bin 2:11AM DBG Loading model in memory from file: /mnt/hulk/models/llama-2-70b-chat.ggmlv3.q4_0.bin 2:11AM DBG Loading GRPC Model whisper: {backendString:whisper modelFile:llama-2-70b-chat.ggmlv3.q4_0.bin threads:14 assetDir:/tmp/localai/backend_data context:0xc00011e010 gRPCOptions:0xc0001a0cf0 externalBackends:map[]} 2:11AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper 2:11AM DBG GRPC Service for llama-2-70b-chat.ggmlv3.q4_0.bin will be running at: '127.0.0.1:41081' 2:11AM DBG GRPC Service state dir: /tmp/go-processmanager3398811853 2:11AM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:41081: connect: connection refused" 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:41081): stderr 2023/08/02 02:11:02 gRPC Server listening at 127.0.0.1:41081 2:11AM DBG GRPC Service Ready 2:11AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:/mnt/hulk/models/llama-2-70b-chat.ggmlv3.q4_0.bin ContextSize:15000 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:14 LibrarySearchPath:/tmp/localai/backend_data/backend-assets/gpt4all RopeFreqBase:0 RopeFreqScale:0} 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:41081): stderr whisper_init_from_file_no_state: loading model from '/mnt/hulk/models/llama-2-70b-chat.ggmlv3.q4_0.bin' 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:41081): stderr whisper_model_load: loading model 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:41081): stderr whisper_model_load: invalid model data (bad magic) 2:11AM DBG GRPC(llama-2-70b-chat.ggmlv3.q4_0.bin-127.0.0.1:41081): stderr whisper_init_no_state: failed to load model 2:11AM DBG [whisper] Fails: could not load model: rpc error: code = Unknown desc = unable to load model 2:11AM DBG [stablediffusion] Attempting to load 2:11AM DBG Loading model stablediffusion from llama-2-70b-chat.ggmlv3.q4_0.bin 2:11AM DBG Loading model in memory from file: /mnt/hulk/models/llama-2-70b-chat.ggmlv3.q4_0.bin 2:11AM DBG Loading GRPC Model stablediffusion: {backendString:stablediffusion modelFile:llama-2-70b-chat.ggmlv3.q4_0.bin threads:14 assetDir:/tmp/localai/backend_data context:0xc00011e010 gRPCOptions:0xc0001a0cf0 externalBackends:map[]} 2:11AM DBG [stablediffusion] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/stablediffusion. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS 2:11AM DBG [piper] Attempting to load 2:11AM DBG Loading model piper from llama-2-70b-chat.ggmlv3.q4_0.bin 2:11AM DBG Loading model in memory from file: /mnt/hulk/models/llama-2-70b-chat.ggmlv3.q4_0.bin 2:11AM DBG Loading GRPC Model piper: {backendString:piper modelFile:llama-2-70b-chat.ggmlv3.q4_0.bin threads:14 assetDir:/tmp/localai/backend_data context:0xc00011e010 gRPCOptions:0xc0001a0cf0 externalBackends:map[]} 2:11AM DBG [piper] Fails: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/piper. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS

Additional context

mudler commented 1 year ago

ngqa is available since #860

localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

Sources:

mudler / LocalAI

Issue with LocalAI Compilation: Wrong Tensor Shape #855

:warning::warning::warning::warning::warning:

:warning::warning::warning::warning::warning: