mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.72k stars 1.81k forks source link

Bert custom embedding: could not load model: rpc error: code = Unknown desc = failed loading model #3094

Open IzzyHibbert opened 2 months ago

IzzyHibbert commented 2 months ago

LocalAI version: 2.19.3

Environment, CPU architecture, OS, and Version: Win 11, AMD Ryzen 5 4500 6-Core Processor, RTX 3090.

Describe the bug I am trying to use a custom embedding model. File is in gguf, the type of model is bert. My YAML file looks like :

f16: true
gpu_layers: 40
name: ItaLegalEmb
backend: bert-embeddings
embeddings: true
parameters:
  model: ItaLegalEmb

The file is ItaLegalEmb.gguf and it's correctly placed in the model folder. When calling the model

curl --location 'http://127.0.0.1:8080/v1/embeddings' \
--header 'Content-Type: application/json' \
--data '{
    "input": "Test",
    "model": "ItaLegalEmb"
}'

The answer is

"could not load model: rpc error: code = Unknown desc = failed loading model"

The debug shows

1:45PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings
1:45PM DBG GRPC Service for ItaLegalEmb will be running at: '127.0.0.1:37719'
1:45PM DBG GRPC Service state dir: /tmp/go-processmanager1685794175
1:45PM DBG GRPC Service Started
1:45PM DBG GRPC(ItaLegalEmb-127.0.0.1:37719): stderr 2024/07/31 13:45:07 gRPC Server listening at 127.0.0.1:37719
1:45PM DBG GRPC Service Ready
1:45PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:ItaLegalEmb ContextSize:512 Seed:194720499 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:true NUMA:false NGPULayers:40 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/ItaLegalEmb Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
1:45PM DBG GRPC(ItaLegalEmb-127.0.0.1:37719): stderr bert_load_from_file: failed to open '/build/models/ItaLegalEmb'
1:45PM DBG GRPC(ItaLegalEmb-127.0.0.1:37719): stderr bert_bootstrap: failed to load model from '/build/models/ItaLegalEmb'
1:45PM ERR Server error error="could not load model: rpc error: code = Unknown desc = failed loading model"

Additional context

rxcca commented 1 week ago

having the same problem on ubuntu 22.04