ollama / ollama-python

Ollama Python library
https://ollama.com
MIT License
4.55k stars 382 forks source link

504 Gateway Timeout - The server didn't respond in time #314

Open devilteo911 opened 5 days ago

devilteo911 commented 5 days ago

I don't know why but I'm encountering this problem with the library. Here I show my simple script:

import ollama

client = ollama.Client(host=llm_config["base_url"], timeout=600)
client.chat(model=config["ollama"]["model"], messages=[{
    "role":"user",
    "content":"Why is the sky blue?"
}])

Where llm_config["base_url"] is the ollama url server (it's a serverless gpu) that I can reach successfully from open-webui and even query the model without issues. The model I'm using is: qwen2.5:32b-instruct-q4_K_M and the GPU is a RTX A6000.

The traceback (client-side) is the following:

Traceback (most recent call last):
  File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 236, in chat
    return self._request_stream(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 99, in _request_stream
    return self._stream(*args, **kwargs) if stream else self._request(*args, **kwargs).json()
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 75, in _request
    raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

and this is what I see on the server side:

[GIN] 2024/11/07 - 22:04:21 | 500 | 50.001124922s |    xx.xx.xx.xx | POST     "/api/chat"

It happens everytime after 50 seconds even if the timeout is 600 seconds. Am I missing something?

MatteoSid commented 5 days ago

I have the same issue