infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
12.37k stars 1.2k forks source link

[Bug]: Call Ollama can not keep alive #980

Open rickywu opened 1 month ago

rickywu commented 1 month ago

Is there an existing issue for the same bug?

Branch name

0.6.0

Commit ID

6c32f80

Other environment information

2080 Ti
WSL2 on Windows 11
Docker

Actual behavior

call Ollama LLM service, GPU mem loaded and unloaded each chat

Expected behavior

Ollama support keep_alive in call params, any negetive numbers can setup

Use the keep_alive parameter with either the /api/generate and /api/chat API endpoints to control how long the model is left in memory. '0' which will unload the model immediately after generating a response

Steps to reproduce

Config Ollama serve LLM
use ragflow to chat

Additional information

No response

ndmil commented 1 month ago

I have solved this issue using below methods

Run this in a terminal:

  1. OLLAMA_HOST=0.0.0.0:11435 ollama pull mistral:latest

Run this in another instance of terminal:

  1. OLLAMA_HOST=0.0.0.0:11435 ollama serve

If there is an error, run the second command first and then run the first command as the second.

After that, you need to get the inet address of the Linux in WSL. Type 'ipconfig' in the terminal and get the inet address. You will enter this address into the Ollama model addition panel. You can understand from the screenshot below.

image