langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
89.75k stars 14.19k forks source link

AzureOpenAI and SQLTools with agent -> ValueError(f"Prompt missing required variables: {missing_vars}") #17921

Closed Prince269090 closed 1 month ago

Prince269090 commented 5 months ago

Checked other resources

Example Code

Similar code works with open ai (see https://python.langchain.com/docs/use_cases/sql/agents) and but not with Azure Open AI

Below is my full code

import os

os.environ['OPENAI_API_KEY'] = 'your key'
os.environ['OPENAI_API_TYPE'] = 'azure'
os.environ['OPENAI_API_VERSION'] = '2023-05-15'

from langchain.sql_database import SQLDatabase

db = SQLDatabase.from_uri("sqlite:///Chinook.db")

from langchain_community.agent_toolkits import create_sql_agent
from langchain_openai import AzureOpenAI  # Updated import

from langchain_community.vectorstores import FAISS
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_openai import AzureOpenAIEmbeddings

llm = AzureOpenAI(deployment_name="GPT35Turbo",model_name="gpt-35-turbo"
                  ,azure_endpoint="https://copilots-aoai-df.openai.azure.com/"
                  , temperature=0, verbose=True)

embeddings = AzureOpenAIEmbeddings(
    azure_endpoint="https://copilots-aoai-df.openai.azure.com/",
    azure_deployment="Ada002EmbeddingModel",
    openai_api_version="2023-05-15",
)

examples = [
    {"input": "List all artists.", "query": "SELECT * FROM Artist;"},
    {
        "input": "Find all albums for the artist 'AC/DC'.",
        "query": "SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'AC/DC');",
    },
    {
        "input": "List all tracks in the 'Rock' genre.",
        "query": "SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');",
    },
    {
        "input": "Find the total duration of all tracks.",
        "query": "SELECT SUM(Milliseconds) FROM Track;",
    },
    {
        "input": "List all customers from Canada.",
        "query": "SELECT * FROM Customer WHERE Country = 'Canada';",
    },
    {
        "input": "How many tracks are there in the album with ID 5?",
        "query": "SELECT COUNT(*) FROM Track WHERE AlbumId = 5;",
    },
    {
        "input": "Find the total number of invoices.",
        "query": "SELECT COUNT(*) FROM Invoice;",
    },
    {
        "input": "List all tracks that are longer than 5 minutes.",
        "query": "SELECT * FROM Track WHERE Milliseconds > 300000;",
    },
    {
        "input": "Who are the top 5 customers by total purchase?",
        "query": "SELECT CustomerId, SUM(Total) AS TotalPurchase FROM Invoice GROUP BY CustomerId ORDER BY TotalPurchase DESC LIMIT 5;",
    },
    {
        "input": "Which albums are from the year 2000?",
        "query": "SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000';",
    },
    {
        "input": "How many employees are there",
        "query": 'SELECT COUNT(*) FROM "Employee"',
    },
]

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    embeddings,
    FAISS,
    k=5,
    input_keys=["input"],
)

from langchain_core.prompts import (
    ChatPromptTemplate,
    FewShotPromptTemplate,
    MessagesPlaceholder,
    PromptTemplate,
    SystemMessagePromptTemplate,
)

system_prefix = """You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
Only use the given tools. Only use the information returned by the tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.

If the question does not seem related to the database, just return "I don't know" as the answer.

Here are some examples of user inputs and their corresponding SQL queries:"""

few_shot_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=PromptTemplate.from_template(
        "User input: {input}\nSQL query: {query}"
    ),
    input_variables=["input", "dialect", "top_k"],
    prefix=system_prefix,
    suffix=""
)

full_prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate(prompt=few_shot_prompt),
        ("human", "{input}"),
        MessagesPlaceholder("agent_scratchpad"),
    ]
)

# Example formatted prompt
prompt_val = full_prompt.invoke(
    {
        "input": "How many arists are there",
        "top_k": 5,
        "dialect": "SQLite",
        "agent_scratchpad": [],
    }
)
print(prompt_val.to_string())

from langchain.agents.agent_types import AgentType

from langchain_community.agent_toolkits import SQLDatabaseToolkit
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

# Get list of tools
tools = toolkit.get_tools()

agent = create_sql_agent(
    llm=llm,
    db = db,
    tools = tools,
    prompt=full_prompt,
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

agent.invoke({"input": "list top 10 customers by total purchase?"})

Error Message and Stack Trace (if applicable)

No response

Description

raceback (most recent call last): File "C:\Projects\Personal\Langchain-DB-Example\main-sqlite-azure.py", line 142, in agent = create_sql_agent( File "C:\Projects\Personal\Langchain-DB-Example.venv\lib\site-packages\langchain_community\agent_toolkits\sql\base.py", line 182, in create_sql_agent runnable=create_react_agent(llm, tools, prompt), File "C:\Projects\Personal\Langchain-DB-Example.venv\lib\site-packages\langchain\agents\react\agent.py", line 97, in create_react_agent raise ValueError(f"Prompt missing required variables: {missing_vars}") ValueError: Prompt missing required variables: {'tools', 'tool_names'}

System Info

System Information

OS: Windows OS Version: 10.0.22621 Python Version: 3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)]

Package Information

langchain_core: 0.1.25 langchain: 0.1.8 langchain_community: 0.0.21 langsmith: 0.1.5 langchain_experimental: 0.0.52 langchain_openai: 0.0.6

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph langserve

PIP freeze

aiohttp==3.9.3 aiosignal==1.3.1 annotated-types==0.6.0 anyio==4.3.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.2.2 charset-normalizer==3.3.2 colorama==0.4.6 dataclasses-json==0.6.4 distro==1.9.0 exceptiongroup==1.2.0 faiss-cpu==1.7.4 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.3 httpx==0.26.0 idna==3.6 jsonpatch==1.33 jsonpointer==2.4 langchain==0.1.8 langchain-community==0.0.21 langchain-core==0.1.25 langchain-experimental==0.0.52 langchain-openai==0.0.6 langsmith==0.1.5 marshmallow==3.20.2 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.12.0 packaging==23.2 pydantic==2.6.1 pydantic_core==2.16.2 PyYAML==6.0.1 regex==2023.12.25 requests==2.31.0 sniffio==1.3.0 SQLAlchemy==2.0.27 tenacity==8.2.3 tiktoken==0.6.0 tqdm==4.66.2 typing-inspect==0.9.0 typing_extensions==4.9.0 urllib3==2.2.1 yarl==1.9.4

liugddx commented 5 months ago

use agent_type="openai-tools"

Prince269090 commented 5 months ago

@liugddx , That is giving another error and i have already tried previously.

TypeError: create() got an unexpected keyword argument 'tools'

image

RadhikaBansal97 commented 5 months ago

Hi @Prince269090,

When you pass agent type as ZERO_SHOT_REACT_DESCRIPTION, create_sql_agent calls create_react_agent internally which requires the prompt template to be in ReAct prompt(react_promt_template)format, so you need to modify the prompt accordingly.

You can follow this link to know more about ReAct prompt formatting - https://python.langchain.com/docs/modules/agents/agent_types/react

Below is the code that you can use -

import os
from langchain.sql_database import SQLDatabase

db = SQLDatabase.from_uri("sqlite:///Chinook.db")
from langchain_community.agent_toolkits import create_sql_agent
from langchain_core.prompts import (
    FewShotPromptTemplate,
    PromptTemplate,
)
from langchain.agents.agent_types import AgentType
from langchain_openai import AzureChatOpenAI

os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["AZURE_OPENAI_ENDPOINT"] = "..."

llm = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_deployment="gpt-35-turbo",
)

examples = [
    {"input": "List all artists.", "query": "SELECT * FROM Artist;"},
    {
        "input": "Find all albums for the artist 'AC/DC'.",
        "query": "SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'AC/DC');",
    },
    {
        "input": "List all tracks in the 'Rock' genre.",
        "query": "SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');",
    },
    {
        "input": "Find the total duration of all tracks.",
        "query": "SELECT SUM(Milliseconds) FROM Track;",
    },
    {
        "input": "List all customers from Canada.",
        "query": "SELECT * FROM Customer WHERE Country = 'Canada';",
    },
    {
        "input": "How many tracks are there in the album with ID 5?",
        "query": "SELECT COUNT(*) FROM Track WHERE AlbumId = 5;",
    },
    {
        "input": "Find the total number of invoices.",
        "query": "SELECT COUNT(*) FROM Invoice;",
    },
    {
        "input": "List all tracks that are longer than 5 minutes.",
        "query": "SELECT * FROM Track WHERE Milliseconds > 300000;",
    },
    {
        "input": "Who are the top 5 customers by total purchase?",
        "query": "SELECT CustomerId, SUM(Total) AS TotalPurchase FROM Invoice GROUP BY CustomerId ORDER BY TotalPurchase DESC LIMIT 5;",
    },
    {
        "input": "Which albums are from the year 2000?",
        "query": "SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000';",
    },
    {
        "input": "How many employees are there",
        "query": 'SELECT COUNT(*) FROM "Employee"',
    },
]
PREFIX = """You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
You have access to the following tools:{tools}. Only use the information returned by the tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.

If the question does not seem related to the database, just return "I don't know" as the answer.

Here are some examples of user inputs and their corresponding SQL queries:"""

FORMAT_INSTRUCTIONS = """Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question"""
SUFFIX = """Begin!

Question: {input}
Thought:{agent_scratchpad}"""

few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=PromptTemplate.from_template(
        "User input: {input}\nSQL query: {query}"
    ),
    input_variables=["input", "dialect", "top_k", "tools", "tool_names"],
    prefix=PREFIX,
    suffix=FORMAT_INSTRUCTIONS + SUFFIX
)

agent = create_sql_agent(
    llm=llm,
    db=db,
    prompt=few_shot_prompt,
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

agent.invoke({"input": "List the total sales per country. Which country's customers spent the most?"})

Let me know if this works for you!!

konradbjk commented 5 months ago

For me the same versions cause a critical error on trying to set up AzureOpenAIEmbedding...

Have you experienced

1 validation error for AzureOpenAIEmbeddings
__root__
  As of openai>=1.0.0, Azure endpoints should be specified via the `azure_endpoint` param not `openai_api_base` (or alias `base_url`).  (type=value_error)
  File "<REDACTED>", line 12, in <module>
    azure_embeddings = AzureOpenAIEmbeddings(
pydantic.v1.error_wrappers.ValidationError: 1 validation error for AzureOpenAIEmbeddings
__root__
  As of openai>=1.0.0, Azure endpoints should be specified via the `azure_endpoint` param not `openai_api_base` (or alias `base_url`).  (type=value_error)

https://github.com/langchain-ai/langchain/issues/18099

zhangucan commented 4 months ago

Hi @Prince269090,

When you pass agent type as ZERO_SHOT_REACT_DESCRIPTION, create_sql_agent calls create_react_agent internally which requires the prompt template to be in ReAct prompt(react_promt_template)format, so you need to modify the prompt accordingly.

You can follow this link to know more about ReAct prompt formatting - https://python.langchain.com/docs/modules/agents/agent_types/react

Below is the code that you can use -

import os
from langchain.sql_database import SQLDatabase

db = SQLDatabase.from_uri("sqlite:///Chinook.db")
from langchain_community.agent_toolkits import create_sql_agent
from langchain_core.prompts import (
    FewShotPromptTemplate,
    PromptTemplate,
)
from langchain.agents.agent_types import AgentType
from langchain_openai import AzureChatOpenAI

os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["AZURE_OPENAI_ENDPOINT"] = "..."

llm = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_deployment="gpt-35-turbo",
)

examples = [
    {"input": "List all artists.", "query": "SELECT * FROM Artist;"},
    {
        "input": "Find all albums for the artist 'AC/DC'.",
        "query": "SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'AC/DC');",
    },
    {
        "input": "List all tracks in the 'Rock' genre.",
        "query": "SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock');",
    },
    {
        "input": "Find the total duration of all tracks.",
        "query": "SELECT SUM(Milliseconds) FROM Track;",
    },
    {
        "input": "List all customers from Canada.",
        "query": "SELECT * FROM Customer WHERE Country = 'Canada';",
    },
    {
        "input": "How many tracks are there in the album with ID 5?",
        "query": "SELECT COUNT(*) FROM Track WHERE AlbumId = 5;",
    },
    {
        "input": "Find the total number of invoices.",
        "query": "SELECT COUNT(*) FROM Invoice;",
    },
    {
        "input": "List all tracks that are longer than 5 minutes.",
        "query": "SELECT * FROM Track WHERE Milliseconds > 300000;",
    },
    {
        "input": "Who are the top 5 customers by total purchase?",
        "query": "SELECT CustomerId, SUM(Total) AS TotalPurchase FROM Invoice GROUP BY CustomerId ORDER BY TotalPurchase DESC LIMIT 5;",
    },
    {
        "input": "Which albums are from the year 2000?",
        "query": "SELECT * FROM Album WHERE strftime('%Y', ReleaseDate) = '2000';",
    },
    {
        "input": "How many employees are there",
        "query": 'SELECT COUNT(*) FROM "Employee"',
    },
]
PREFIX = """You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
You have access to the following tools:{tools}. Only use the information returned by the tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.

If the question does not seem related to the database, just return "I don't know" as the answer.

Here are some examples of user inputs and their corresponding SQL queries:"""

FORMAT_INSTRUCTIONS = """Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question"""
SUFFIX = """Begin!

Question: {input}
Thought:{agent_scratchpad}"""

few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=PromptTemplate.from_template(
        "User input: {input}\nSQL query: {query}"
    ),
    input_variables=["input", "dialect", "top_k", "tools", "tool_names"],
    prefix=PREFIX,
    suffix=FORMAT_INSTRUCTIONS + SUFFIX
)

agent = create_sql_agent(
    llm=llm,
    db=db,
    prompt=few_shot_prompt,
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

agent.invoke({"input": "List the total sales per country. Which country's customers spent the most?"})

Let me know if this works for you!!

The problem has been solved.