Open zhongshuai-cao opened 2 weeks ago
I found some similar issues that might help you:
TogetherEmbedding()
class to improve user experience [2].To handle the RateLimitError
and ensure it waits for the specified time and continues from where it left off, you can use the create_retry_decorator
function. Here’s an example:
import openai
from llama_index.embeddings.upstage.utils import create_retry_decorator
# Define the retry decorator with desired parameters
retry_decorator = create_retry_decorator(
max_retries=5, # Maximum number of retries
random_exponential=True, # Use random exponential backoff
stop_after_delay_seconds=60, # Stop retrying after 60 seconds
min_seconds=4, # Minimum wait time between retries
max_seconds=10 # Maximum wait time between retries
)
@retry_decorator
def embed_batch(batch):
# Your code to embed the batch using AzureOpenAI
response = openai.Embedding.create(input=batch)
return response
# Example usage
batches = [...] # Your list of batches to embed
for batch in batches:
try:
result = embed_batch(batch)
# Process the result and build the index
except openai.RateLimitError as e:
logger.warning(f"Rate limit error: {e}")
# Handle the rate limit error if needed
This code sets up a retry mechanism that will handle RateLimitError
by waiting for a random exponential time between retries, up to a maximum of 5 retries or 60 seconds [4].
Additionally, you can adjust the batch size when initializing the AzureOpenAIEmbedding
class:
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
embedding_llm = AzureOpenAIEmbedding(
embed_batch_size=1, # Adjust the batch size as needed
)
This approach ensures that you can continue embedding and building the index from where it left off, even if you encounter rate limit errors [5].
To continue talking to Dosu, mention @dosu.
I found some similar issues that might help you:
- [Question]: RateLimitError: Error code: 429 (closed) - The solution was to ensure the OpenAI API key is active and the account has sufficient quota. Enabling the automatic recharge setting on the OpenAI account can prevent this issue [1].
- Implement Rate Limit Handling in TogetherEmbedding() Class (closed) - Suggested implementing rate limit handling in the
TogetherEmbedding()
class to improve user experience [2].- Getting rate limit error from Azure OpenAI when using GPTSimpleVectorIndex (closed) - Suggested reducing the embeddings creation request rate by adjusting the batch size or implementing a delay between requests [3].
To handle the
RateLimitError
and ensure it waits for the specified time and continues from where it left off, you can use thecreate_retry_decorator
function. Here’s an example:import openai from llama_index.embeddings.upstage.utils import create_retry_decorator # Define the retry decorator with desired parameters retry_decorator = create_retry_decorator( max_retries=5, # Maximum number of retries random_exponential=True, # Use random exponential backoff stop_after_delay_seconds=60, # Stop retrying after 60 seconds min_seconds=4, # Minimum wait time between retries max_seconds=10 # Maximum wait time between retries ) @retry_decorator def embed_batch(batch): # Your code to embed the batch using AzureOpenAI response = openai.Embedding.create(input=batch) return response # Example usage batches = [...] # Your list of batches to embed for batch in batches: try: result = embed_batch(batch) # Process the result and build the index except openai.RateLimitError as e: logger.warning(f"Rate limit error: {e}") # Handle the rate limit error if needed
This code sets up a retry mechanism that will handle
RateLimitError
by waiting for a random exponential time between retries, up to a maximum of 5 retries or 60 seconds [4].Additionally, you can adjust the batch size when initializing the
AzureOpenAIEmbedding
class:from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding embedding_llm = AzureOpenAIEmbedding( embed_batch_size=1, # Adjust the batch size as needed )
This approach ensures that you can continue embedding and building the index from where it left off, even if you encounter rate limit errors [5].
To continue talking to Dosu, mention @dosu.
@dosu here is the code I use to run index, please advise how do I update my code to use the logic you provide?
async_index_client = AsyncSearchIndexClient(
endpoint=search_service_endpoint,
credential=search_service_credential,
api_version=search_service_api_version
)
vector_store = AzureAISearchVectorStore(
search_or_index_client=async_index_client,
index_name=index_name,
index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
id_field_key="id",
chunk_field_key="chunk",
embedding_field_key="embedding",
embedding_dimensionality=EMBED_SIZE,
metadata_string_field_key="metadata",
doc_id_field_key="doc_id",
language_analyzer="en.lucene",
vector_algorithm_type="exhaustiveKnn",
)
for document in tqdm(documents, total=len(documents), desc="Processing files"): try: VectorStoreIndex.from_documents( [document], storage_context=storage_context, use_async=True )
To update your code to use the retry logic for handling RateLimitError
with AzureOpenAI, you can follow these steps:
Import the necessary functions and modules:
import openai
from llama_index.llms.konko.utils import _create_retry_decorator
Create a retry decorator with the desired parameters:
retry_decorator = _create_retry_decorator(max_retries=5)
Apply the retry decorator to your function that makes the API call:
@retry_decorator
def embed_batch(batch):
response = openai.Embedding.create(input=batch)
return response
Update your code to use the retry logic:
async_index_client = AsyncSearchIndexClient(
endpoint=search_service_endpoint,
credential=search_service_credential,
api_version=search_service_api_version
)
vector_store = AzureAISearchVectorStore(
search_or_index_client=async_index_client,
index_name=index_name,
index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
id_field_key="id",
chunk_field_key="chunk",
embedding_field_key="embedding",
embedding_dimensionality=EMBED_SIZE,
metadata_string_field_key="metadata",
doc_id_field_key="doc_id",
language_analyzer="en.lucene",
vector_algorithm_type="exhaustiveKnn",
)
for document in tqdm(documents, total=len(documents), desc="Processing files"):
try:
VectorStoreIndex.from_documents(
[document],
storage_context=storage_context,
use_async=True
)
except openai.error.RateLimitError as e:
logger.warning(f"Rate limit error: {e}")
# Handle the rate limit error if needed
By following these steps, you will have integrated the retry logic to handle RateLimitError
with AzureOpenAI in your code [1][2][3][4].
To continue talking to Dosu, mention @dosu.
@dosu
I am actually using Azure OpenAI instead of openai as below code snippet, and from your answer, the embed_batch function is not used at all.
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
embed_model = AzureOpenAIEmbedding(
model=EMBED_MODEL,
deployment_name=EMBED_DEPLOYMENT_NAME,
api_key=AZURE_OPENAI_API_KEY,
azure_endpoint=AZURE_OPENAI_ENDPOINT,
api_version=AZURE_OPENAI_API_VERSION,
embed_batch_size=5
)
To handle RateLimitError
using Azure OpenAI with the AzureOpenAIEmbedding
class, you can add a retry mechanism directly in the methods that make API calls. Here’s how you can update your code:
Import necessary modules:
import time
from openai.error import RateLimitError
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
Define a retry mechanism:
class AzureOpenAIEmbeddingWithRetry(AzureOpenAIEmbedding):
def _get_client(self) -> AzureOpenAI:
if not self.reuse_client:
return self._create_client_with_retries()
if self._client is None:
self._client = self._create_client_with_retries()
return self._client
def _get_aclient(self) -> AsyncAzureOpenAI:
if not self.reuse_client:
return self._create_aclient_with_retries()
if self._aclient is None:
self._aclient = self._create_aclient_with_retries()
return self._aclient
def _create_client_with_retries(self) -> AzureOpenAI:
retries = 0
while retries < self.max_retries:
try:
return AzureOpenAI(**self._get_credential_kwargs())
except RateLimitError:
retries += 1
time.sleep(2 ** retries) # Exponential backoff
raise RateLimitError("Max retries exceeded for creating AzureOpenAI client")
def _create_aclient_with_retries(self) -> AsyncAzureOpenAI:
retries = 0
while retries < self.max_retries:
try:
return AsyncAzureOpenAI(**self._get_credential_kwargs(is_async=True))
except RateLimitError:
retries += 1
time.sleep(2 ** retries) # Exponential backoff
raise RateLimitError("Max retries exceeded for creating AsyncAzureOpenAI client")
Use the updated class in your code:
embed_model = AzureOpenAIEmbeddingWithRetry(
model=EMBED_MODEL,
deployment_name=EMBED_DEPLOYMENT_NAME,
api_key=AZURE_OPENAI_API_KEY,
azure_endpoint=AZURE_OPENAI_ENDPOINT,
api_version=AZURE_OPENAI_API_VERSION,
embed_batch_size=5,
max_retries=5 # Set the maximum number of retries
)
async_index_client = AsyncSearchIndexClient(
endpoint=search_service_endpoint,
credential=search_service_credential,
api_version=search_service_api_version
)
vector_store = AzureAISearchVectorStore(
search_or_index_client=async_index_client,
index_name=index_name,
index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
id_field_key="id",
chunk_field_key="chunk",
embedding_field_key="embedding",
embedding_dimensionality=EMBED_SIZE,
metadata_string_field_key="metadata",
doc_id_field_key="doc_id",
language_analyzer="en.lucene",
vector_algorithm_type="exhaustiveKnn",
)
for document in tqdm(documents, total=len(documents), desc="Processing files"):
try:
VectorStoreIndex.from_documents(
[document],
storage_context=storage_context,
use_async=True
)
except RateLimitError as e:
logger.warning(f"Rate limit error: {e}")
# Handle the rate limit error if needed
This code integrates a retry mechanism with exponential backoff to handle RateLimitError
when creating the AzureOpenAI
and AsyncAzureOpenAI
clients, ensuring that your embedding process can continue from where it left off [1].
To continue talking to Dosu, mention @dosu.
Bug Description
I got retrying error raised on running batch embeddings with AzuerOpenAI, and I wonder how do I make it wait for the limited time (60s per AOAI but 30s per raised error code) and continue to embed and build the index at where it was left automatically.
Retrying llama_index.embeddings.openai.base.aget_embeddings in 0.6633798960965387 seconds as it raised RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the Embeddings_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 32 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'
Version
0.10.67.post1
Steps to Reproduce
Just index a very large file or a bigger batch will cause this error
Relevant Logs/Tracbacks