langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
84.8k stars 13.1k forks source link

Error in AgentExecutor output result #20965

Open jyoti194 opened 2 weeks ago

jyoti194 commented 2 weeks ago

Checked other resources

Example Code

from langchain.sql_database import SQLDatabase from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder,PromptTemplate from langchain.tools import BaseTool from langchain.tools.render import format_tool_to_openai_function from langchain.schema.runnable import Runnable,RunnableLambda,RunnableParallel from langchain.chat_models import ChatOpenAI from langchain.agents.format_scratchpad import format_to_openai_function_messages from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser from langchain.agents import AgentExecutor from pydantic import BaseModel, Field import os from secret_key import openapi_key from sqlalchemy import create_engine import constants from datetime import datetime import re from typing import Optional

os.environ['OPENAI_API_KEY'] = 'sk-xxxxxxxxxxxx'

os.environ['OPENAI_API_KEY'] = openapi_key

def chat5(question: str):

# Define the SQL DML Chain Prompt Template
SQL_DML_CHAIN_PROMPT = """You are an expert in SQLITE. Your main objective is to construct Data manipulation SQLITE query given the
user question: {user_question}.

You need to construct the Data manipulation SQLITE query for the following Database Schema:
{table_info}

Only Output the final SQL-Query and nothing else.

SQL-Query:"""

SQL_DML_CHAIN_PROMPT = """You are expert in MS SQL. Your main objective is to construct Data manipulation MS SQL query give the
user question: {user_question}.

You need to construct the Data manipulation SQLITE query for the following Database Schema:
{table_info}

Check user query for INSERT, UPDATE or DELETE operation, based on that perform the sql query.
Wrapped column names: All column names should be wrapped in square brackets [] as delimiters.
Use GETDATE() to get the current date and time instead of DATETIME('now').
Consider the column name carefully from the PAY_transaction_settingallow table where the modification is done.
Take 'euid' in PAY_transaction_settingallow table from employee_details table which is 'employeeEuid' in employee_details table based on the matching 'employeeName' or 'employeeId' from employee_details table and stores it as 'euid' in the PAY_transaction_settingallow table. Similarly, to get the allowance ID ('alw_id') from the pay_mst_allowance table based on the matching allowance description ('alw_desc') or short name ('short_name'), check both 'alw_desc' and 'short_name'.        
Pay attention to the column names in user query and sql query.
Perform JOIN operation to fetch euid and alw_id from respective tables not INNER JOIN.
Selected table: Specify PAY_transaction_settingallow as the table to update.
Employee and allowance selection: Use the WHERE clause to filter employees based on employeeName and allowances based on alw_desc.
Date handling:
'createDate' should be the date on which the values are inserted, current date, it can not be NULL.
'effect_date' should be the start date of month.
'to_date' should be the end date of month.
'updatedDate' should be the date on which values are updated and it should be included only when UPDATE keywoed is used i query.
UPDATE query should not change the 'createdDate'
Currency: Assume the amount to be in rupees.
If the users gives 'given by' in query enter that value in 'givenBy' column in PAY_transaction_settingallow table
Modify the 'givenBy' column only when given by is specified by user, if not dont take givenBy column. 
Removed newlines: Write the query as a single string without newlines (\n).

Ensure the query executes efficiently and without errors.

If data or Values are already present in the table, dont again run the sql query.
If no modifcation is made in table dont not display the message. 
If no rows are modifyed output its as "0 rows affected"
Only Output the message after execution of SQL-Query 
If duplicate value is encountred with check_duplicate function Output the message like "The SQL query has been skipped due to duplicate data.".

-- Replace DATETIME('now') with GETDATE() for SQL Server compatibility
REPLACE(CAST('' AS VARCHAR(MAX)), 'DATETIME(''now'')', 'GETDATE()')

SQL-Query:
{user_question}

"""

# Define the prompt template
prompt = PromptTemplate(template=SQL_DML_CHAIN_PROMPT, input_variables=['user_question', 'table_info'])

from urllib.parse import quote_plus
server_name = constants.server_name
database_name = constants.database_name
username = constants.username 
password = constants.password
encoded_password = quote_plus(password)
connection_uri = f"mssql+pyodbc://{username}:{encoded_password}@{server_name}/{database_name}?driver=ODBC+Driver+17+for+SQL+Server"

engine = create_engine(connection_uri)
model_name = "get-3.5-turbo-16k"
# db = SQLDatabase(engine, view_support=True, include_tables=['HR_Testing'])
# db = SQLDatabase(engine, view_support=True, include_tables=['EGV_emp_departments_ChatGPT'])
db = SQLDatabase(engine, view_support=True, include_tables=['PAY_transaction_settingallow', 'PAY_mst_allowance','employee_details'],sample_rows_in_table_info=5)

def check_duplicate(query: str, db: SQLDatabase) -> Optional[str]:
    """
    Check for duplicate data before executing an INSERT query.

    Args:
        query (str): The SQL query to be checked.
        db (SQLDatabase): The SQLDatabase object.

    Returns:
        Optional[str]: The original query if no duplicate is found, or None if a duplicate is detected.
    """

    # Define a regular expression pattern to match the INSERT statement
    pattern = r'INSERT INTO (\w+) \((.*?)\) VALUES \((.*?)\)'

    # Attempt to match the pattern in the query
    match = re.match(pattern, query)

    # Check if the pattern matched successfully
    if match:
        # Extract table name, columns, and values from the matched groups
        table_name = match.group(1)
        columns = [col.strip() for col in match.group(2).split(',')]
        values = [val.strip() for val in match.group(3).split(',')]

        # Check if the table name matches the expected table name (e.g., 'PAY_transaction_settingallow')
        if table_name != 'PAY_transaction_settingallow':
            # If the table name doesn't match, return the original query
            return query

        # Extract the values of euid, alw_id, amount, and effect_date from the values list
        euid_index = columns.index('euid')
        alw_id_index = columns.index('alw_id')
        amount_index = columns.index('amount')
        # effect_date_index = columns.index('effect_date')
        euid = values[euid_index]
        alw_id = values[alw_id_index]
        amount = values[amount_index]
        # effect_date = values[effect_date_index]

        # Construct a SELECT query to check for duplicates based on euid, alw_id, and effect_date
        check_query = f"SELECT 1 FROM {table_name} WHERE euid = ? AND alw_id = ? AND amount = ?"

        # Execute the check query with the values of euid, alw_id, and effect_date
        cursor = db.cursor()
        cursor.execute(check_query, (euid, alw_id, amount))

        row = cursor.fetchone()

        # If a row is found, return None to indicate that the query should not be executed
        if row:
            return "The SQL query has been skipped due to duplicate data."
        else:
            # No duplicates found, return the original query
            return query

    # If the pattern didn't match or the table name wasn't recognized, return the original query
    return query

# Define the SQL DML Chain with the modified check_duplicate function
sql_dml_chain = RunnableParallel({
    "user_question": lambda x: question,
    "table_info": lambda _: db.get_table_info()
}) | \
                 prompt | \
                 ChatOpenAI().bind(stop='SQL-Query:') | \
                 RunnableLambda(lambda x:
                                 check_duplicate(x.content, db) if 'INSERT' in x.content else x.content)

# Define the Chat Prompt Template
agent_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", """
        You are expert in SQL whose main objective is to mainpulate the Database for which you have
        been given access. You can use the tool `sql_db_manipulation` to interact with  Database and
        mainpulate the database as per the user requirement.

        Check user query for INSERT, UPDATE or DELETE operation, based on that perform the sql query.
        Wrapped column names: All column names should be wrapped in square brackets [] as delimiters.
        Use GETDATE() to get the current date and time instead of DATETIME('now').
        Consider the column name carefully from the PAY_transaction_settingallow table where the modification is done.
        Take 'euid' in PAY_transaction_settingallow table from employee_details table which is 'employeeEuid' in employee_details table based on the matching 'employeeName' or 'employeeId' from employee_details table and stores it as 'euid' in the PAY_transaction_settingallow table. Similarly, to get the allowance ID ('alw_id') from the pay_mst_allowance table based on the matching allowance description ('alw_desc') or short name ('short_name'), check both 'alw_desc' and 'short_name'.        
        Pay attention to the column names in user query and sql query.
        Perform JOIN operation to fetch euid and alw_id from respective tables not INNER JOIN.
        Selected table: Specify PAY_transaction_settingallow as the table to update.
        Employee and allowance selection: Use the WHERE clause to filter employees based on employeeName and allowances based on alw_desc.
        Date handling:
        'createDate' should be the date on which the values are inserted, current date, it can not be NULL.
        'effect_date' should be the start date of month.
        'to_date' should be the end date of month.
        'updatedDate' should be the date on which values are updated and it should be included only when UPDATE keywoed is used i query.
        UPDATE query should not change the 'createdDate'
        Currency: Assume the amount to be in rupees.
        If the users gives 'given by' in query enter that value in 'givenBy' column in PAY_transaction_settingallow table
        Modify the 'givenBy' column only when given by is specified by user, if not dont take givenBy column. 
        Removed newlines: Write the query as a single string without newlines (\n).

        Ensure the query executes efficiently and without errors.

        If data or Values are already present in the table, dont again run the sql query.
        If no modifcation is made in table dont not display the message. 
        If no rows are modifyed output its as "0 rows affected"
        Only Output the message after execution of SQL-Query 
        If duplicate value is encountred with check_duplicate function Output the message like "The SQL query has been skipped due to duplicate data.".
"""),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

# Define the data model for SQLDBMANIPULATION tool
class SQLDBMANIPULATION(BaseModel):
    user_query: str = Field(description='User question which will be translated to a Data Manipulation SQL Query and will be executed on the underlying database')

class SQLDBMANIPULATIONTool(BaseTool):
    name = "sql_db_manipulation"
    description = "Use this tool to convert and execute DML queries given the user question, Use GETDATE() to get the current date and time instead of DATETIME('now')"
    args_schema: type[SQLDBMANIPULATION] = SQLDBMANIPULATION
    sql_dml_chain: Runnable 

    def _run(
        self, user_query: str
    ) -> str:
        """Use the tool."""
        try:
            if "The SQL query has been skipped due to duplicate data." not in user_query:
                # Execute the SQL query
                affected_rows = db._execute(user_query)

                if affected_rows == 0:
                    # If no rows were affected, return the appropriate message
                    return "0 rows affected"
                else:
                    # Otherwise, return the success message
                    return "The SQL query has been executed successfully."
            else:
                # If the query was skipped due to duplicate data, return the message
                return "The SQL query has been skipped due to duplicate data."
        except Exception as e:
            # If an error occurs during execution
            error_message = f"Error executing SQL query: {str(e)}"
            if "duplicate data" not in error_message.lower():
                # If the error is not due to duplicate data, return the specific error message
                return error_message
            elif "Invalid column name" in error_message:
                # If the error is due to an invalid column name, provide a more informative error message
                return "Error: Invalid column name used in the SQL query."
            else:
                # If the error is due to duplicate data, return the standard message
                return "The SQL query has been skipped due to duplicate data."

tools = [SQLDBMANIPULATIONTool(sql_dml_chain=sql_dml_chain)]
llm_with_tools = ChatOpenAI().bind(functions=[format_tool_to_openai_function(t) for t in tools])
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
    }
    | agent_prompt 
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

answer = agent_executor.invoke({"input": question})

return answer["output"]

result= chat5("insert personal allowance of 6000 to employee id TA*** for feb 2024, given by 3003") print(result)

Error Message and Stack Trace (if applicable)

Entering new AgentExecutor chain...

Invoking: sql_db_manipulation with {'user_query': "INSERT INTO PAY_transaction_settingallow (euid, alw_id, amount, effect_date, to_date, createDate, givenBy) SELECT ed.employeeEuid, pma.alw_id, 6000, '2024-02-01', '2024-02-29', GETDATE(), '3003' FROM employee_details ed JOIN pay_mst_allowance pma ON ed.employeeName = 'TA*******' AND (pma.alw_desc = 'personal allowance' OR pma.short_name = 'personal allowance') WHERE NOT EXISTS (SELECT 1 FROM PAY_transaction_settingallow WHERE euid = ed.employeeEuid AND alw_id = pma.alw_id AND effect_date = '2024-02-01' AND to_date = '2024-02-29' AND amount = 6000)"}

The SQL query has been executed successfully.The personal allowance of 6000 has been inserted for employee ID TA*** for February 2024, given by 3003.

Description

in the output result of AgentExecutor its give the result as "The personal allowance of 6000 has been inserted for employee ID TA for February 2024, given by 3003. " but there is an error in the sql query written there as it's taking employee name instead of employeeID, in that case it should give the output something like "query is not executed" or "0 rows effectd" instead of giving as "The personal allowance of 6000 has been inserted for employee ID TA for February 2024, given by 3003." in lot of cases its happening the same when its not done any changes in table still its giving the output has "has been inserted" or "has been updated"

i have written a logic to handle it externally, but that doesn't seems to work, so how to resolve this issue?

System Info

python: 3.11 langchain: latest os: windows

jyoti194 commented 2 weeks ago

@dasubot

Joshua-Shepherd commented 1 week ago

These are all LLM issues i'm afraid. You'll need to come up with better ways of formatting your prompts, and maybe hardcoding parts of the SQL query. This isn't really anything wrong with Langchain or AgentExecutor, I don't believe. Can you look in Langsmith and share the execution? Just incase?