ollama server hangs constantly

rihp commented 4 months ago

My ollama server hangs constantly, as in takes in queries, my gpu makes noise, but doesnt respond back in the jupyter environment unless i restart the ollama process a couple of times, any idea on how to debug what might be making it just hang thiking ? I’m on linux using vs code insider version

Ive set it so that after 20 seconds it restarts the ollama server, it works like 10% of the times though and its very time consuming

Here is the systemctl status ollama.service

● ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2024-02-17 00:50:12 CET; 3min 5s ago
   Main PID: 7148 (ollama)
      Tasks: 21 (limit: 19015)
     Memory: 1.6G
        CPU: 14.581s
     CGroup: /system.slice/ollama.service
             └─7148 /usr/local/bin/ollama serve

feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: freq_base  = 1000000.0
feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: freq_scale = 1
feb 17 00:50:16 ImgOracle ollama[7148]: llama_kv_cache_init: VRAM kv self = 256.00 MB
feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
feb 17 00:50:16 ImgOracle ollama[7148]: llama_build_graph: non-view tensors processed: 676/676
feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: compute buffer total size = 159.19 MiB
feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: VRAM scratch buffer: 156.00 MiB
feb 17 00:50:16 ImgOracle ollama[7148]: llama_new_context_with_model: total VRAM used: 4259.56 MiB (model: 3847.55 MiB, context: 412.00 MiB)
feb 17 00:50:16 ImgOracle ollama[7148]: 2024/02/17 00:50:16 ext_server_common.go:151: Starting internal llama main loop
feb 17 00:50:16 ImgOracle ollama[7148]: 2024/02/17 00:50:16 ext_server_common.go:165: loaded 0 images

rihp commented 4 months ago

I'm using something like

from langchain.llms import Ollama
def ask_mistral(question, num_predict=2768, k=25, timeout=20):
            ollama = Ollama(base_url='http://localhost:11434', model="mistral", num_predict=num_predict)
            response = ollama(question)
            return question

but have it so that it retries after 20 seconds, restarting the ollama server by killing the process (that automatically spawns a new ollama instance 2 seconds later) that is working very rarely though.

rihp commented 4 months ago

updating ollama seems to decrease the frequency of this issue?

curl https://ollama.ai/install.sh | sh

stevengans commented 4 months ago

@rihp this should have fixed the issue: https://github.com/ollama/ollama/pull/2459

Are you using v0.1.25 of ollama?

rihp commented 4 months ago

Upgrading to 0.1.25 lowered the frequency of the issue, i'll share logs later this week!

mxyng commented 4 months ago

This repo is for the python client library. For server issues, please see ollama/ollama

ollama / ollama-python

ollama server hangs constantly #69