Open yanwun opened 2 months ago
Now it's actually just loading the model when chatting. When adding a new model, it will just load it for a bit to verify model availability. If you want to load the model every time you chat without having it constantly in GPU memory, you can try ollama's keep-loading setting.
Is there an existing issue for the same feature request?
Is your feature request related to a problem?
No response
Describe the feature you'd like
Hi there, I have a question here, I use Ollama to be my LLM provider, and I know that when add llm function called it will load model in the ollama service. But is there any possible that add LLM do not load model and when chat starting and then load model? because if there have mulitple user using the different fine-tune model in Ollama it will crash the GPU memory.
Describe implementation you've considered
No response
Documentation, adoption, use case
No response
Additional information
No response