[Bug]: WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 0.7360705661705746 seconds as it raised APIConnectionError: Connection error..

Bug Description

I am running the script below as part of a bigger project and every once in a while I get this Warning WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 0.7360705661705746 seconds as it raised APIConnectionError: Connection error..

It happens after around 2k calls but it's not happening consistently. I would appreciate your help with this.


from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
from llama_index.core import VectorStoreIndex, StorageContext
atlas_vector_search = MongoDBAtlasVectorSearch(
                    db_client,
                    db_name=db_name,
                    collection_name=collection_name,
                    index_name=index_name
                )

# Get the storage context
vector_store_context = StorageContext.from_defaults(vector_store=atlas_vector_search)

# upload the new data to the atlas vector search
_ = VectorStoreIndex.from_documents(
    [document], storage_context=vector_store_context, show_progress=True
)```

### Version

0.10.33

### Steps to Reproduce

Not entirely sure since it's not a consistent occurrence.

### Relevant Logs/Tracbacks

_No response_

Hey @elahea2020! I'm here to help you with any bugs, questions, or contributions. If you need assistance, feel free to ask!

The warning you're encountering is related to the retry mechanism in the llama_index.embeddings.openai.base.get_embeddings method. This method uses a retry and backoff mechanism to handle transient connection errors. Specifically, it retries up to 6 times with a random exponential backoff, stopping after a delay of 60 seconds, with a minimum delay of 4 seconds and a maximum delay of 10 seconds ^[1]^[2].

To mitigate this issue, you can implement a more robust retry mechanism with exponential backoff in your script. Here's an example of how you can modify your script to include retries with exponential backoff:

import requests
from requests.exceptions import RequestException
import time
import logging

logger = logging.getLogger(__name__)

def rate_limit(*, allowed_per_second: int):
    max_period = 1.0 / allowed_per_second
    last_call = [time.perf_counter()]
    lock = threading.Lock()

    def decorate(func):
        @wraps(func)
        def limit(*args, **kwargs):
            with lock:
                elapsed = time.perf_counter() - last_call[0]
                hold = max_period - elapsed
                if hold > 0:
                    time.sleep(hold)
                result = func(*args, **kwargs)
                last_call[0] = time.perf_counter()
            return result

        return limit

    return decorate

@rate_limit(allowed_per_second=15)
def rate_limited_get(url, headers, retries=5, backoff_factor=0.3):
    """
    Make a GET request with rate limiting and retry with exponential backoff.
    """
    for attempt in range(retries):
        try:
            resp = requests.get(url, headers=headers)
            if resp.status_code == 429:
                logger.warning("Rate limited, sleeping for 5 minutes")
                time.sleep(300)
                return rate_limited_get(url, headers)
            resp.raise_for_status()
            return resp
        except RequestException as e:
            logger.warning(f"Request failed: {e}, attempt {attempt + 1} of {retries}")
            if attempt < retries - 1:
                sleep_time = backoff_factor * (2 ** attempt)
                logger.info(f"Retrying in {sleep_time} seconds...")
                time.sleep(sleep_time)
            else:
                logger.error("Max retries reached. Raising exception.")
                raise

Additionally, you can enable detailed logging to gain more insights into the cause of the intermittent connection errors. Here's how you can set up logging in your application:

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

For more advanced monitoring, you can use callback handlers provided by LlamaIndex, such as LlamaDebugHandler or LangfuseCallbackHandler, to track events and traces ^[3]^[4]^[5].

These steps should help you handle the intermittent APIConnectionError warnings more effectively.

To continue talking to Dosu, mention @dosu.

run-llama / llama_index

[Bug]: WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.get_embeddings in 0.7360705661705746 seconds as it raised APIConnectionError: Connection error.. #15238

Bug Description