Closed lenaxia closed 11 months ago
shortened context window fixed it
shortened context window fixed it
how to shortened context window? in model config?
InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0x140001aa848 MirostatTAU:0x140001aa840 Mirostat:0x140001aa838 NGPULayers:0x140001aa850 MMap:0x140001aa858 MMlock:0x140001aa859 LowVRAM:0x140001aa859 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0x140001aa808 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
11:22PM INF Loading model 'bert-MiniLM-L6-v2q4_1.bin' with backend bert-embeddings
11:22PM DBG Model already loaded in memory: bert-MiniLM-L6-v2q4_1.bin
11:22PM WRN GRPC Model not responding: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:59492: connect: connection refused"
11:22PM WRN Deleting the process in order to recreate it
11:22PM DBG GRPC Process is not responding: bert-MiniLM-L6-v2q4_1.bin
11:22PM DBG Loading model in memory from file: /Users/block/code/data/models/bert-MiniLM-L6-v2q4_1.bin
11:22PM DBG Loading Model bert-MiniLM-L6-v2q4_1.bin with gRPC (file: /Users/block/code/data/models/bert-MiniLM-L6-v2q4_1.bin) (backend: bert-embeddings): {backendString:bert-embeddings model:bert-MiniLM-L6-v2q4_1.bin threads:10 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0x140001f3000 externalBackends:map[sentencetransformers:/Users/block/code/LocalAI/backend/python/sentencetransformers/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
11:22PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings
11:22PM DBG GRPC Service for bert-MiniLM-L6-v2q4_1.bin will be running at: '127.0.0.1:59527'
11:22PM DBG GRPC Service state dir: /var/folders/t0/y4k0vcfx5_bd9qx7pl7lbj9h0000gn/T/go-processmanager380860985
11:22PM DBG GRPC Service Started
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr 2024/03/20 23:22:55 gRPC Server listening at 127.0.0.1:59527
11:22PM DBG GRPC Service Ready
11:22PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:bert-MiniLM-L6-v2q4_1.bin ContextSize:1024 Seed:2084307436 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:true NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:10 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/Users/block/code/data/models/bert-MiniLM-L6-v2q4_1.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type:}
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: loading model from '/Users/block/code/data/models/bert-MiniLM-L6-v2q4_1.bin' - please wait ...
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: n_vocab = 30522
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: n_max_tokens = 512
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: n_embd = 384
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: n_intermediate = 1536
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: n_head = 12
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: n_layer = 6
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: f16 = 3
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: ggml ctx size = 16.26 MB
11:22PM INF Loading model 'bert-MiniLM-L6-v2q4_1.bin' with backend bert-embeddings
11:22PM DBG Model already loaded in memory: bert-MiniLM-L6-v2q4_1.bin
.................
11:22PM INF Loading model 'bert-MiniLM-L6-v2q4_1.bin' with backend bert-embeddings
11:22PM DBG Model already loaded in memory: bert-MiniLM-L6-v2q4_1.bin
11:22PM INF Loading model 'bert-MiniLM-L6-v2q4_1.bin' with backend bert-embeddings
11:22PM DBG Model already loaded in memory: bert-MiniLM-L6-v2q4_1.bin
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr SIGSEGV: segmentation violation
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr PC=0x102fa2600 m=4 sigcode=2
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr signal arrived during cgo execution
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr goroutine 194 [syscall]:
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr runtime.cgocall(0x102f8aa9c, 0x140001377a8)
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr /usr/local/go/src/runtime/cgocall.go:157 +0x44 fp=0x14000137770 sp=0x14000137730 pc=0x102bf7804
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr github.com/go-skynet/go-bert%2ecpp._Cfunc_bert_embeddings(0x1557044e0, 0x155610a00, 0x14000404000)
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr _cgo_gotypes.go:138 +0x34 fp=0x140001377a0 sp=0x14000137770 pc=0x102cdb624
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr github.com/go-skynet/go-bert%2ecpp.(*Bert).Embeddings.func1(0x1557258f0?, 0xa?, 0x1557044e0?)
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr /Users/block/code/LocalAI/sources/go-bert/gobert.go:38 +0x74 fp=0x140001377f0 sp=0x140001377a0 pc=0x102cdbca4
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr github.com/go-skynet/go-bert%2ecpp.(*Bert).Embeddings(0x140001c6d80?, {0x1400019c800, 0x3e7}, {0x140001378f0, 0x1, 0x140001f8900?})
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr /Users/block/code/LocalAI/sources/go-bert/gobert.go:38 +0xe8 fp=0x14000137860 sp=0x140001377f0 pc=0x102cdbac8
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr main.(*Embeddings).Embeddings(0x14000137958?, 0x102bfe084?)
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr /Users/block/code/LocalAI/backend/go/llm/bert/bert.go:33 +0x90 fp=0x14000137900 sp=0x14000137860 pc=0x102f8a070
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr github.com/go-skynet/LocalAI/pkg/grpc.(*server).Embedding(0x14000192c90, {0x1400020e180?, 0x14000218380?}, 0x0?)
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr /Users/block/code/LocalAI/pkg/grpc/server.go:37 +0xb8 fp=0x14000137990 sp=0x14000137900 pc=0x102f88488
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr github.com/go-skynet/LocalAI/pkg/grpc/proto._Backend_Embedding_Handler({0x10313d6c0?, 0x14000192c90}, {0x103170358, 0x140002bf980}, 0x14000218380, 0x0)
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr /Users/block/code/LocalAI/pkg/grpc/proto/backend_grpc.pb.go:303 +0x164 fp=0x140001379f0 sp=0x14000137990 pc=0x102f82bc4
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0x140002481e0, {0x103170358, 0x140002bf8c0}, {0x103173878, 0x1400008a820}, 0x140001f8240, 0x1400025e9f0, 0x10342ea68, 0x0)
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr /Users/block/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1343 +0xb8c fp=0x14000137de0 sp=0x140001379f0 pc=0x102f6d33c
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr google.golang.org/grpc.(*Server).handleStream(0x140002481e0, {0x103173878, 0x1400008a820}, 0x140001f8240)
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr /Users/block/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1737 +0x988 fp=0x14000137f60 sp=0x14000137de0 pc=0x102f714e8
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stderr google.golang.org/grpc.(*Server).serveStreams.func1.1()
.............
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: ............ done
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: model size = 16.24 MB / num tensors = 101
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout bert_load_from_file: mem_per_token 452 KB, mem_per_input 248 MB
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout loaded
11:22PM DBG GRPC(bert-MiniLM-L6-v2q4_1.bin-127.0.0.1:59527): stdout ggml_new_tensor_impl: not enough space in the context's memory pool (needed 271388624, available 260919296)
[127.0.0.1]:59525 500 - POST /v1/embeddings
LocalAI version: v2.0.0
10:23PM INF LocalAI version: v2.0.0 (238fec244ae6c9a66bc7fafd76c7e14671110a6f)
Environment, CPU architecture, OS, and Version:
Describe the bug Any embedding request with long text generates a segfault
To Reproduce
Expected behavior Return an embedding vector
Logs
Additional context