[Bug]: When using llamaIndex to access a Ollama local LLM (llama3:8b)service in mac, there will be error 503

toondata commented 2 months ago

Bug Description

In a Mac environment, when the OS system proxy is set, using llamaIndex to access the local Ollama environment LLM , It will results in an access error.

error msg ---------------------------------------------------------------------------------------------------------- httpx.HTTPStatusError: Server error '503 Service Unavailable' for url 'http://127.0.0.1:11434/api/chat' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503

and i try to change the baseurl , it's error too

httpx.HTTPStatusError: Server error '503 Service Unavailable' for url 'http://localhost:11434/api/chat' For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503

so maybe this has nothing to do with proxy

But I use Rapid api tools build the same http request,it can access the serivce and no error

--------------------------------------------------Rapid Api access is OK

Version

0.10.37

Steps to Reproduce

self.llm = Ollama(model="llama3:8b", request_timeout=120.0, json_mode=True,base_url="http://127.0.0.1:11434")

messages = [ ChatMessage( role="system", content=systemPrompt ), ChatMessage(role="user", content=userPrompt), ] resp = self.llm.chat(messages=messages) ---error

------------------------------------------------------------- use the same code but change llm to openai,the code is running ok self.llm = OpenAI(model=model,api_key="sk-xxxxxxx") messages = [ ChatMessage( role="system", content=systemPrompt ), ChatMessage(role="user", content=userPrompt), ] resp = self.llm.chat(messages=messages)

Relevant Logs/Tracbacks

File "/Users/xxx/Library/Mobile Documents/com~apple~CloudDocs/products/dev/workspaces/flask/llm/ollama.py", line 31, in chat
    resp = self.llm.chat(messages=messages)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/anaconda3/envs/ollama/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 148, in wrapped_llm_chat
    f_return_val = f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/anaconda3/envs/ollama/lib/python3.11/site-packages/llama_index/llms/ollama/base.py", line 135, in chat
    response.raise_for_status()
  File "/Users/xxx/anaconda3/envs/ollama/lib/python3.11/site-packages/httpx/_models.py", line 761, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '503 Service Unavailable' for url 'http://127.0.0.1:11434/api/chat'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503

dosubot[bot] commented 2 months ago

To resolve the 503 Service Unavailable error when using llamaIndex to access a local Ollama LLM service (llama3:8b) on a Mac with the OS system proxy set, you can try bypassing the system proxy for local addresses. This can be done by configuring your proxy settings to exclude the local address localhost or 127.0.0.1.

Here is how you can do it:

Open System Preferences on your Mac.
Go to Network settings.
Select the network interface you are using (e.g., Wi-Fi or Ethernet).
Click on Advanced.
Go to the Proxies tab.
In the Bypass proxy settings for these Hosts & Domains field, add localhost and 127.0.0.1.

This will ensure that requests to your local Ollama LLM service are not routed through the system proxy, which should resolve the 503 Service Unavailable error.

Additionally, ensure that your Ollama server is running and accessible at the specified base_url (http://localhost:11434).

Here is a sample code snippet to verify the connection:

from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3:8b", request_timeout=60.0)

try:
    response = llm.complete("What is the capital of France?")
    print(response)
except Exception as e:
    print(f"Error: {e}")

Make sure the server is running by executing ollama serve in your terminal and that the model is downloaded using ollama pull llama3:8b.

If the issue persists, you can enable additional logging to gain more insight into the problem:

Basic Logging: Turn on debug logging to get more detailed logs of what your application is doing.

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

Callback Handler: Use the callback manager to add callbacks that help debug, track, and trace the inner workings of the library. For example, you can use the LlamaDebugHandler to print the trace of events after most operations.
```
from llama_index.legacy.callbacks.llama_debug import LlamaDebugHandler
import llama_index.core

debug_handler = LlamaDebugHandler()
llama_index.core.set_global_handler(debug_handler)
```

These options should help you gain more insight into the 503 Service Unavailable error and identify any underlying issues.

#### Details

**Sources** - [Using Ollama with Llamaindex return Error](https://github.com/run-llama/llama_index/discussions/11333) - [llama-index-integrations/llms/llama-index-llms-ollama/llama_index/llms/ollama/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-ollama/llama_index/llms/ollama/base.py) - [docs/docs/understanding/tracing_and_debugging/tracing_and_debugging.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/understanding/tracing_and_debugging/tracing_and_debugging.md) - [llama-index-core/llama_index/core/callbacks/llama_debug.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/callbacks/llama_debug.py) - [llama-index-legacy/llama_index/legacy/callbacks/llama_debug.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/callbacks/llama_debug.py) - [llama-index-integrations/llms/llama-index-llms-ollama/pyproject.toml](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-ollama/pyproject.toml) - [llama-index-integrations/multi_modal_llms/llama-index-multi-modal-llms-ollama/pyproject.toml](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/multi_modal_llms/llama-index-multi-modal-llms-ollama/pyproject.toml) - [llama-index-integrations/embeddings/llama-index-embeddings-ollama/pyproject.toml](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-ollama/pyproject.toml) - [docs/docs/CHANGELOG.md](https://github.com/run-llama/llama_index/blob/main/docs/docs/CHANGELOG.md)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

RussellLuo commented 2 months ago

In a Mac environment, when the OS system proxy is set, using llamaIndex to access the local Ollama environment LLM , It will results in an access error.

If the proxy is turned off, would LlamaIndex still throws the same error?

so maybe this has nothing to do with proxy

But I use Rapid api tools build the same http request,it can access the serivce and no error

I'm not familiar with Rapid API, but I guess perhaps it's aware of the proxy if you have set Proxy Configurations for it.

As a comparison, I suggest you try the cURL command to see what results it yields.

toondata commented 2 months ago

Thank you everyone. The ultimate issue indeed stemmed from the setting of a global proxy. It remains unclear why the use of Rapid API did not trigger the proxy while behaving normally. The issues observed with the effects of curl and the Python Flask application calls were similar. Additionally, the attempted solution of bypassing the global proxy via macOS network settings proved ineffective on my machine, and it's unclear whether it's due to my configuration. However, this isn't my most pressing concern at the moment, so I've adjusted my local large model trial to conduct various tests of large model performance through OpenRouter. Thanks again, everyone.

run-llama / llama_index