[Bug]: Call Ollama can not keep alive

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Apache License 2.0

12.37k stars 1.2k forks source link

Is there an existing issue for the same bug?

[X] I have checked the existing issues.

Branch name

0.6.0

Commit ID

6c32f80

Other environment information

2080 Ti
WSL2 on Windows 11
Docker

Actual behavior

call Ollama LLM service, GPU mem loaded and unloaded each chat

Expected behavior

Ollama support keep_alive in call params, any negetive numbers can setup

Use the keep_alive parameter with either the /api/generate and /api/chat API endpoints to control how long the model is left in memory. '0' which will unload the model immediately after generating a response

Steps to reproduce

Config Ollama serve LLM
use ragflow to chat

Additional information

No response

infiniflow / ragflow