ollama / ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
https://ollama.com
MIT License
97.85k stars 7.79k forks source link

embeddings models keep_alive #6401

Closed Abdulrahman392011 closed 2 months ago

Abdulrahman392011 commented 2 months ago

I use embeddings models a lot and every time it loads the model do the vectoring and then unload it immediately. when I try to keep alive by using this command

$ curl http://localhost:11434/api/generate -d '{"model": "mxbai-embed-large:latest", "keep_alive": -1}'

it tells me that this model isn't a generative model and refuse to keep alive. please have support for it to decrease the latency as it copies the 600 megabytes every time and then delete it which adds a couple of seconds to an operation that should take only a second.

rick-github commented 2 months ago

You need to load an embedding model via the embedding API endpoint:

$ curl http://localhost:11434/api/embed -d '{"model": "mxbai-embed-large:latest", "keep_alive": -1}'
$ ollama ps
NAME                            ID              SIZE    PROCESSOR       UNTIL   
mxbai-embed-large:latest        468836162de7    1.2 GB  100% GPU        Forever
Abdulrahman392011 commented 2 months ago

thank you. I tried and it works.