Closed 0xSchellen closed 3 months ago
Hey, you can do that:
let request = GenerationRequest::new(...).keep_alive(KeepAlive::Indefinitely);
Is that what you meant?
Hi! Thanks for the response!
This works fine for the Completions API.
But i can´t use it in the generate_embeddings trait.
let response = ollama
.generate_embeddings(model.to_string(), prompt, None)
.await?;
Sorry for the delay, I'll fix that
It seems that there is no parameter to set "keep_alive": -1 in the Struct Generation Options.
The idea is to load the embedding model and keeping it in the memory.