Closed Abdulrahman392011 closed 2 months ago
You need to load an embedding model via the embedding API endpoint:
$ curl http://localhost:11434/api/embed -d '{"model": "mxbai-embed-large:latest", "keep_alive": -1}'
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
mxbai-embed-large:latest 468836162de7 1.2 GB 100% GPU Forever
thank you. I tried and it works.
I use embeddings models a lot and every time it loads the model do the vectoring and then unload it immediately. when I try to keep alive by using this command
$ curl http://localhost:11434/api/generate -d '{"model": "mxbai-embed-large:latest", "keep_alive": -1}'
it tells me that this model isn't a generative model and refuse to keep alive. please have support for it to decrease the latency as it copies the 600 megabytes every time and then delete it which adds a couple of seconds to an operation that should take only a second.