infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
21.64k stars 2.12k forks source link

[Feature Request]: Add LLM #1973

Open yanwun opened 2 months ago

yanwun commented 2 months ago

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

No response

Describe the feature you'd like

Hi there, I have a question here, I use Ollama to be my LLM provider, and I know that when add llm function called it will load model in the ollama service. But is there any possible that add LLM do not load model and when chat starting and then load model? because if there have mulitple user using the different fine-tune model in Ollama it will crash the GPU memory.

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

cyhasuka commented 2 months ago

Now it's actually just loading the model when chatting. When adding a new model, it will just load it for a bit to verify model availability. If you want to load the model every time you chat without having it constantly in GPU memory, you can try ollama's keep-loading setting.