langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.93k stars 15.38k forks source link

Rate limit error #634

Closed gameveloster closed 8 months ago

gameveloster commented 1 year ago

I'm getting an openai RateLimitError when embedding my chunked texts with "text-embedding-ada-002", which I have rate limited to 8 chunks of <1024 every 15 secs.

openai.error.RateLimitError: Rate limit reached for default-global-with-image-limits in organization org-xxx on requests per min. Limit: 60.000000 / min. Current: 70.000000 / min. Contact support@openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://beta.openai.com/account/billing to add a payment method.

Every 15 seconds, I'm calling this once;

for ...
    search_index.add_texts(texts=chunked[i : i + 8])
    time.sleep(15)

The chunks list chunked was created using

text_splitter = NLTKTextSplitter(chunk_size=1024)
chunked = [chunk for source in sources for chunk in text_splitter.split_text(source) ]

Why is my request rate exceeding 70/min when I'm only embedding at ~32 chunks/min? Does each chunk take more than 1 request to process?

Anyway to better rate limit my embedding queries? Thanks

alvaropp commented 1 year ago

Perhaps not quite the same scenario, but I'm getting exactly the same error when running the VectorDB Question Answering with Sources example.

Perhaps adding some exponential backoff as OpenAI recommend?

GaurangTandon commented 1 year ago

I ran into rate limits when using FAISS.from_texts on one markdown file with ~800 lines with the Question Answering with Sources sample. I worked around it like this. Posting in case it is useful for other users:

def chunks(lst, n):
  # https://stackoverflow.com/a/312464/18903720
  """Yield successive n-sized chunks from lst."""
  for i in range(0, len(lst), n):
    yield lst[i:i + n]

text_chunks = chunks(texts, 20) # adjust 20 based on your average character count per line
docsearch = None
for (index, chunk) in tqdm.tqdm(enumerate(text_chunks)):
  if index == 0:
    docsearch = FAISS.from_texts(texts, embeddings)
  else:
    time.sleep(60) # wait for a minute to not exceed any rate limits
    docsearch.add_texts(chunk)
prashanthdumpuriNeurance commented 1 year ago

Didn't work for me. Did OpenAI change something or am I missing something here?

Can you please help me?

juliencarponcy commented 1 year ago

Same for me today with the example at https://python.langchain.com/en/latest/use_cases/code/code-analysis-deeplake.html

Is their a way to integrate a solution to the example code to avoid it?

EricLee911110 commented 1 year ago

Still having the same issue, I tried something like this: embeddings = OpenAIEmbeddings() vector_store = FAISS.from_texts(texts=["example1", "example2"], embedding=embeddings) and vector_store = Chroma.from_texts(texts=["example1", "example2"], embedding=embeddings)

Got: Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details..

I'm passing a list that has a length of 2, and it is giving me RateLimitError.

Tried two versions of Langchain, 0.0.162 and 0.0.188, and both appeared with the same error.

mahithsc commented 1 year ago

I am running into the same issue, when using the function:

Chroma.from_texts

Did anyone manage to come up with a solution which gets around the rate limit.

Thinking of looping through texts in try except, and adding a sleep function for when the RateLimit is reached, then retrying.

fullstackwebdev commented 1 year ago

Any solution?

getsean commented 1 year ago

Is this the same issue you guys are getting?

Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details..

ImcLiuQian commented 1 year ago

Is this the same issue you guys are getting?

Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details..

yes

lefta commented 1 year ago

@getsean @ImcLiuQian (and any other that get « You exceeded your current quota » in the error message) : this has nothing to do with the original question. please see https://github.com/langchain-ai/langchain/issues/11914 instead.

mahithsc commented 1 year ago

The solution is to implement an exponential backoff or just a simple 10 second wait. Use try, except block and when the exception is hit, simply wait 10 seconds before running the function again.

chintanmehta21 commented 12 months ago

I ran into rate limits when using FAISS.from_texts on one markdown file with ~800 lines with the Question Answering with Sources sample. I worked around it like this. Posting in case it is useful for other users:

def chunks(lst, n):
  # https://stackoverflow.com/a/312464/18903720
  """Yield successive n-sized chunks from lst."""
  for i in range(0, len(lst), n):
    yield lst[i:i + n]

text_chunks = chunks(texts, 20) # adjust 20 based on your average character count per line
docsearch = None
for (index, chunk) in tqdm.tqdm(enumerate(text_chunks)):
  if index == 0:
    docsearch = FAISS.from_texts(texts, embeddings)
  else:
    time.sleep(60) # wait for a minute to not exceed any rate limits
    docsearch.add_texts(chunk)

Is there a way to do the same for FAISS.from_documents() ?

bilalProgTech commented 3 months ago

I tried below method and it works for me,

vector_store = <your_vector_store>
documents = loader.load() #any loader that you used
for text in documents:
    vector_store.add_documents([text])