I don't know why but I'm encountering this problem with the library. Here I show my simple script:
import ollama
client = ollama.Client(host=llm_config["base_url"], timeout=600)
client.chat(model=config["ollama"]["model"], messages=[{
"role":"user",
"content":"Why is the sky blue?"
}])
Where llm_config["base_url"] is the ollama url server (it's a serverless gpu) that I can reach successfully from open-webui and even query the model without issues. The model I'm using is: qwen2.5:32b-instruct-q4_K_M and the GPU is a RTX A6000.
The traceback (client-side) is the following:
Traceback (most recent call last):
File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 236, in chat
return self._request_stream(
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 99, in _request_stream
return self._stream(*args, **kwargs) if stream else self._request(*args, **kwargs).json()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/shared/devilteo911/cvr-agent/.venv/lib/python3.11/site-packages/ollama/_client.py", line 75, in _request
raise ResponseError(e.response.text, e.response.status_code) from None
ollama._types.ResponseError: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
I don't know why but I'm encountering this problem with the library. Here I show my simple script:
Where
llm_config["base_url"]
is the ollama url server (it's a serverless gpu) that I can reach successfully from open-webui and even query the model without issues. The model I'm using is:qwen2.5:32b-instruct-q4_K_M
and the GPU is a RTX A6000.The traceback (client-side) is the following:
and this is what I see on the server side:
It happens everytime after 50 seconds even if the timeout is 600 seconds. Am I missing something?