langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
88.7k stars 13.94k forks source link

429 Resource Exhausted error when using gemini-1.5-pro with langchain #22241

Open sidagarwal04 opened 1 month ago

sidagarwal04 commented 1 month ago

Checked other resources

Example Code

import gradio as gr
import typing_extensions
import os

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts.prompt import PromptTemplate
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain.memory import ConversationBufferMemory

# process of getting credentials
def get_credentials():
    google_api_key = os.getenv("GOOGLE_API_KEY") # get json credentials stored as a string
    if google_api_key is None:
        raise ValueError("Provide your Google API Key")

    return google_api_key

# pass
os.environ["GOOGLE_API_KEY"]= get_credentials()

NEO4J_URI = os.getenv("NEO4J_URI")
NEO4J_USERNAME = os.getenv("NEO4J_USERNAME")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")

CYPHER_GENERATION_TEMPLATE = """You are an expert Neo4j Cypher translator who understands the question in english and convert to Cypher strictly based on the Neo4j Schema provided and following the instructions below:
1. Generate Cypher query compatible ONLY for Neo4j Version 5
2. Do not use EXISTS, SIZE keywords in the cypher. Use alias when using the WITH keyword
3. Please do not use same variable names for different nodes and relationships in the query.
4. Use only Nodes and relationships mentioned in the schema
5. Always enclose the Cypher output inside 3 backticks
6. Always do a case-insensitive and fuzzy search for any properties related search. Eg: to search for a Company name use `toLower(c.name) contains 'neo4j'`
7. Candidate node is synonymous to Manager
8. Always use aliases to refer the node in the query
9. 'Answer' is NOT a Cypher keyword. Answer should never be used in a query.
10. Please generate only one Cypher query per question. 
11. Cypher is NOT SQL. So, do not mix and match the syntaxes.
12. Every Cypher query always starts with a MATCH keyword.
13. Always do fuzzy search for any properties related search. Eg: when the user asks for "matrix" instead of "the matrix", make sure to search for a Movie name using use `toLower(c.name) contains 'matrix'` 
Schema:
{schema}
Samples:
Question: List down 5 movies that released after the year 2000
Answer: MATCH (m:Movie)  WHERE m.released > 2000  RETURN m LIMIT 5
Question: Get all the people who acted in a movie that was released after 2010
Answer: MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)  WHERE m.released > 2010  RETURN p,r,m
Question: Name the Director of the movie Apollo 13
Answer: MATCH (m:Movie)<-[:DIRECTED]-(p:Person)  WHERE toLower(m.title) contains "apollo 13"  RETURN p.name
Question: {question}
Answer: 
"""

CYPHER_GENERATION_PROMPT = PromptTemplate(
    input_variables=["schema","question"], validate_template=True, template=CYPHER_GENERATION_TEMPLATE
)

CYPHER_QA_TEMPLATE = """You are an assistant that helps to form nice and human understandable answers.
The information part contains the provided information that you must use to construct an answer.
The provided information is authoritative, you must never doubt it or try to use your internal knowledge to correct it.
Make the answer sound as a response to the question. Do not mention that you based the result on the given information.
Here are two examples:

Question: List down 5 movies that released after the year 2000
Context:[movie:The Matrix Reloaded, movie:The Matrix Revolutions, movie:Something's Gotta Give, movie:The Polar Express, movie:RescueDawn]
Helpful Answer: The Matrix Reloaded, The Matrix Revolutions, Something's Gotta Give, The Polar Express and RescueDawn are the movies released after the year 2000.

Question: Who is the director of the movie V for Vendetta
Context:[person:James Marshall]
Helpful Answer: James Marshall is the director of the movie V for Vendetta.

If the provided information is empty, say that you don't know the answer.
Final answer should be easily readable and structured.
Information:
{context}

Question: {question}
Helpful Answer:"""

CYPHER_QA_PROMPT = PromptTemplate(
    input_variables=["context", "question"], template=CYPHER_QA_TEMPLATE
)

graph = Neo4jGraph(
    url=NEO4J_URI, 
    username=NEO4J_USERNAME, 
    password=NEO4J_PASSWORD,
    enhanced_schema=True
)

chain = GraphCypherQAChain.from_llm(
    ChatGoogleGenerativeAI(model='gemini-1.5-pro', max_output_tokens=8192, temperature=0.0),
    graph=graph,
    cypher_prompt=CYPHER_GENERATION_PROMPT,
    qa_prompt=CYPHER_QA_PROMPT,
    verbose=True,
    validate_cypher=True
)

memory = ConversationBufferMemory(memory_key = "chat_history", return_messages = True)

def chat_response(input_text,history):
    try:
        return str(chain.invoke(input_text)['result'])

    except Exception as e:  # Catch specific exceptions or log the error
        print(f"An error occurred: {e}")
        return "I'm sorry, there was an error retrieving the information you requested."

interface = gr.ChatInterface(fn = chat_response,
                             title = "Movies Chatbot",
                             theme = "soft",
                             chatbot = gr.Chatbot(height=430),
                             undo_btn = None,
                             clear_btn = "\U0001F5D1 Clear Chat",
                             examples = ["List down 5 movies that released after the year 2000",
                                         "Get all the people who acted in a movie that was released after 2010",
                                         "Name the Director of the movie Apollo 13",
                                         "Who are the actors in the movie V for Vendetta"])

# Launch the interface
interface.launch(share=True)

Error Message and Stack Trace (if applicable)

===== Application Startup at 2024-05-28 17:43:47 =====

Caching examples at: '/home/user/app/gradio_cached_examples/14' Caching example 1/4

Entering new GraphCypherQAChain chain... Generated Cypher:

MATCH (m:Movie) WHERE m.released > 2000 RETURN m LIMIT 5

Full Context: [{'m': {'tagline': 'Free your mind', 'title': 'The Matrix Reloaded', 'released': 2003}}, {'m': {'tagline': 'Everything that has a beginning has an end', 'title': 'The Matrix Revolutions', 'released': 2003}}, {'m': {'title': "Something's Gotta Give", 'released': 2003}}, {'m': {'tagline': 'This Holiday Season… Believe', 'title': 'The Polar Express', 'released': 2004}}, {'m': {'tagline': "Based on the extraordinary true story of one man's fight for freedom", 'title': 'RescueDawn', 'released': 2006}}]

Finished chain. Caching example 2/4

Entering new GraphCypherQAChain chain... Generated Cypher: cypher MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) WHERE m.released > 2010 RETURN p, r, m

Full Context: Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 2.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).. [{'p': {'born': 1960, 'name': 'Hugo Weaving'}, 'r': ({'born': 1960, 'name': 'Hugo Weaving'}, 'ACTED_IN', {'tagline': 'Everything is connected', 'title': 'Cloud Atlas', 'released': 2012}), 'm': {'tagline': 'Everything is connected', 'title': 'Cloud Atlas', 'released': 2012}}, {'p': {'born': 1956, 'name': 'Tom Hanks'}, 'r': ({'born': 1956, 'name': 'Tom Hanks'}, 'ACTED_IN', {'tagline': 'Everything is connected', 'title': 'Cloud Atlas', 'released': 2012}), 'm': {'tagline': 'Everything is connected', 'title': 'Cloud Atlas', 'released': 2012}}, {'p': {'born': 1966, 'name': 'Halle Berry'}, 'r': ({'born': 1966, 'name': 'Halle Berry'}, 'ACTED_IN', {'tagline': 'Everything is connected', 'title': 'Cloud Atlas', 'released': 2012}), 'm': {'tagline': 'Everything is connected', 'title': 'Cloud Atlas', 'released': 2012}}, {'p': {'born': 1949, 'name': 'Jim Broadbent'}, 'r': ({'born': 1949, 'name': 'Jim Broadbent'}, 'ACTED_IN', {'tagline': 'Everything is connected', 'title': 'Cloud Atlas', 'released': 2012}), 'm': {'tagline': 'Everything is connected', 'title': 'Cloud Atlas', 'released': 2012}}]

Finished chain. Caching example 3/4

Entering new GraphCypherQAChain chain... Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 2.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).. Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 4.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).. Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 8.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).. Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 16.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).. Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 32.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).. Generated Cypher: cypher MATCH (m:Movie)<-[:DIRECTED]-(p:Person) WHERE toLower(m.title) CONTAINS 'apollo 13' RETURN p.name

Full Context: [{'p.name': 'Ron Howard'}]

Finished chain. Caching example 4/4

Entering new GraphCypherQAChain chain... Generated Cypher: cypher MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) WHERE toLower(m.title) contains "v for vendetta" RETURN p

Full Context: [{'p': {'born': 1960, 'name': 'Hugo Weaving'}}, {'p': {'born': 1981, 'name': 'Natalie Portman'}}, {'p': {'born': 1946, 'name': 'Stephen Rea'}}, {'p': {'born': 1940, 'name': 'John Hurt'}}, {'p': {'born': 1967, 'name': 'Ben Miles'}}] Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 2.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).. Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 4.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).. Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 8.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).. Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 16.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).. Retrying langchain_google_genai.chat_models._chat_with_retry.._chat_with_retry in 32.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota)..

Finished chain. Running on local URL: http://0.0.0.0:7860 /usr/local/lib/python3.10/site-packages/gradio/blocks.py:2368: UserWarning: Setting share=True is not supported on Hugging Face Spaces warnings.warn(

To create a public link, set share=True in launch().

Description

I am trying to use Google Gemini 1.5 Pro API key from Google AI Studio in the above code and getting the error: Retrying langchain_google_genai.chat_models._chat_with_retry.<locals>._chat_with_retry in 2.0 seconds as it raised ResourceExhausted: 429 Resource has been exhausted (e.g. check quota)..

This doesn't seem right as the API call was made only twice. I tried switching to gemini-1.5-flash and it seems to work fine. I am assuming this has something to relate to gemini 1.5 pro's implementation with langchain. Quoting one of the replies in a "somewhat" similar issue:

My theory is that the gemini-pro-1.5-latest endpoint has some sort of other limit, that we as users can't see when using the "generativeai" python SDK. The only thing that shows up in metrics is failed API calls, but NOT limit hits. The way around this, I believe, would be to directly use the Vertex SDK directly, not the GenAI API.

System Info

neo4j-driver
gradio
langchain==0.1.20
langchain_google_genai
langchain-community
Benniepie commented 1 month ago

Hello - not sure if this will help you, but I think it is the cause of my 429 errors - Is it the 32,000 tokens per minute limit that is breaking your request? I'm getting 429 errors on every prompt over 32,000 tokens sent via the API for Gemini-1.5-Pro. I think it must be an error....otherwise the 1,000,000 context window becomes a bit irrelevant!...

Screenshot 2024-06-01 152914

sidagarwal04 commented 1 month ago

I don't think I am sending that many tokens per minute but let me check again if that's the case.

paopao0226 commented 1 month ago

Problem solved? I also had the same problem

sidagarwal04 commented 1 month ago

@paopao0226 Not yet. Still struggling with this. :(

bencipher commented 2 weeks ago

I also have the same issue on gemini-1.0-pro

image

ReEnMikki commented 2 weeks ago

Same problem bruh, I want to give the AI system prompt that is very long as context for it, I was wondering like no way 1M tokens limit can exhaust resources already when what I give it is barely 40-50K tokens at most. This 30K token limits is stupid.

Joseph-Cardwell commented 2 weeks ago

Perhaps the 1 million context only works with documents that are uploaded?

bencipher commented 1 week ago

Which model are you using @ReEnMikki and @Joseph-Cardwell

ReEnMikki commented 1 week ago

Which model are you using @ReEnMikki and @Joseph-Cardwell

I was using gemini-1.5-pro

I think @Benniepie 's image is correct, when I used gemini-1.5-flash I can upload tons of texts in one go with no problem. The only problem is that this model is more stupid than gemini-1.5-pro, which is already more stupid than GPT-4o

bencipher commented 1 week ago

😂 not so openAI is clear in performance sadly. But I think the reason for your error is because you interacted with the model pass the number of times you are allowed to in a minute. For gemini1.5pro I think it’s 2x per minute.

On Mon, 24 Jun 2024 at 13:56, ReEnMikki @.***> wrote:

Which model are you using @ReEnMikki https://github.com/ReEnMikki and @Joseph-Cardwell https://github.com/Joseph-Cardwell

I was using gemini-1.5-pro

I think @Benniepie https://github.com/Benniepie 's image is correct, when I used gemini-1.5-flash I can upload tons of texts in one go with no problem. The only problem is that this model is more stupid than gemini-1.5-pro, which is already more stupid than GPT-4o

— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchain/issues/22241#issuecomment-2186519335, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIB3UJENFJC4CP2IN4SWCLZJAJRTAVCNFSM6AAAAABINNFMC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBWGUYTSMZTGU . You are receiving this because you commented.Message ID: @.***>

ReEnMikki commented 1 week ago

32k tokens only per minute defeats the whole purpose of 1M tokens context window, now if I wanna give it a long set of instruction for system prompt, to serve as base context for it to read from to generate outputs, I have to call the API like dozens of times, equivalent to waiting dozens of minutes, bastard Google put this unnecessary limit to subtly force us to pay and suffocate the viability of free tier

bencipher commented 1 week ago

What are you building, agents ?

On Mon, 24 Jun 2024 at 14:40, ReEnMikki @.***> wrote:

32k tokens only per minute defeats the whole purpose of 1M tokens context window, now if I wanna give it a long set of instruction for system prompt, to serve as base context for it to read from to generate outputs, I have to call the API like dozens of times, equivalent to waiting dozens of minutes, bastard Google put this unnecessary limit to subtly force us to pay and suffocate the viability of free tier

— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchain/issues/22241#issuecomment-2186611686, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIB3ULMDHP5LMHZ4IB7ZYLZJAOWPAVCNFSM6AAAAABINNFMC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBWGYYTCNRYGY . You are receiving this because you commented.Message ID: @.***>