Error on calling an embedding model, error reading from server: EOF

EmanuelJr commented 1 month ago

LocalAI version: localai/localai:master-cublas-cuda12-ffmpeg

Environment, CPU architecture, OS, and Version:

K3S
RTX 3090
2x Xeon 2680 V4

Describe the bug Error rpc error: code = Unavailable desc = error reading from server: EOF on calling /embeddings for the model mixedbread-ai/mxbai-embed-large-v1

To Reproduce Download the model and use the following configuration: Moreover, I tried with mmap: true without the f16: true and some other variations.

name: mxbai-embed-large
backend: llama
embeddings: true
f16: true
parameters:
  model: mxbai-embed-large-v1-f16.gguf

Curl used:

curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
  "input": "Your text string goes here",
  "model": "mxbai-embed-large"
}'

Expected behavior Should return the prompt embedded.

Logs

12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stderr llama_new_context_with_model: graph splits = 2
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stderr common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"initialize","line":547,"message":"initializing slots","n_slots":1}
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"initialize","line":556,"message":"new slot","slot_id":0,"n_ctx_slot":512}
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"launch_slot_with_data","line":929,"message":"slot is processing task","slot_id":0,"task_id":0}
12:09PM DBG GRPC(mxbai-embed-large-127.0.0.1:33141): stdout {"timestamp":1729426198,"level":"INFO","function":"update_slots","line":1827,"message":"kv cache rm [p0, end)","slot_id":0,"task_id":0,"p0":0}
12:09PM ERR Server error error="rpc error: code = Unavailable desc = error reading from server: EOF" ip=127.0.0.1 latency=3.287815242s method=POST status=500 url=/embeddings

Additional context

etlweather commented 4 weeks ago

I get same also with nomic-embed-text-v1.5.Q8_0.gguf, mxbai-embed-large-v1.q8_0.gguf. (without the F16 param set).

I tried others. Basically the only embedding model I got working so far is MiniLM-L6-v2q4_0.bin using the bert-embeddings backend. And this one works but if the input is too large, it fails with a 500 error.

EmanuelJr commented 3 weeks ago

@etlweather I did it work with sentencetransformers backend, it's simple to set up like the example in the docs. I still want to use llama backend instead of it.

etlweather commented 3 weeks ago

@EmanuelJr sentencetransformers would be fine, it just needs to accept a large input. But so far, all those I tried just won't work either. They fail to load... I haven't had time to look further into this yet.

mudler / LocalAI

Error on calling an embedding model, error reading from server: EOF #3886