langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
89.66k stars 14.17k forks source link

SQLDatabaseChain runs under the GPT4 model and reports an error #7691

Closed nickscut closed 8 months ago

nickscut commented 1 year ago

System Info

my python code

from langchain import OpenAI, SQLDatabase, SQLDatabaseChain from langchain.chains import SQLDatabaseSequentialChain

db = SQLDatabase.from_uri("clickhouse://xx:xx@ip/db", include_tables=include_tables, custom_table_info=custom_table_schemas, sample_rows_in_table_info=2)

llm = OpenAI(temperature=0, model_name="gpt-4-0613", verbose=True, streaming=True, openai_api_base="https://xxx.cn/v1")

db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, use_query_checker=True, return_intermediate_steps=True, top_k=3)

instruction = "Statistics EL_C1 device uptime today" result = db_chain(instruction) result["intermediate_steps"]

run result

image DatabaseException: Orig exception: Code: 62. DB::Exception: Syntax error: failed at position 1 ('The') (line 1, col 1): The original query seems to be correct as it doesn't have any of the common mistakes mentioned. Here is the reproduction of the original query:

I don't know why these strings are being run as SQL

if set use_query_checker = False

image There is an extra double quote in SQL

DatabaseException: Orig exception: Code: 62. DB::Exception: Syntax error: failed at position 1 ('"SELECT SUM(value) FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today()"'): "SELECT SUM(value) FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today()".

All these errors are only generated under the GPT4 model. If the default model is used, no errors are generated

Who can help?

No response

Information

Related Components

Reproduction

llm = OpenAI(temperature=0, model_name="gpt-4-0613", verbose=True, streaming=True, openai_api_base="https://xxx.cn/v1")

db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, use_query_checker=False, return_intermediate_steps=True, top_k=3)

As long as the GPT4 model is used, there will be problems through the db chain, and the string returned by the model will be executed as SQL

Expected behavior

image

The hope is to generate correct SQL execution, do not treat the string returned by the model as SQL execution

dosubot[bot] commented 1 year ago

Answer generated by a 🤖

Answer

From your description, it appears that you're encountering issues with the GPT-4 model generating incorrect SQL queries when used with the SQLDatabaseChain class in the LangChain framework. When the use_query_checker attribute is set to True, the SQL query generated by the language model is checked and potentially corrected before execution. However, if the generated SQL query is incorrect and cannot be corrected, an exception is raised.

In your case, the GPT-4 model seems to be generating SQL queries that are incorrect and cannot be corrected by the query checker, hence the exceptions. When you set use_query_checker to False, the incorrect SQL query is executed directly, leading to a syntax error.

Unfortunately, I can't see the images or links you've provided as I'm an AI and can only process text. However, based on your description, here are a few suggestions that might help:

  1. Review the SQL queries being generated by the GPT-4 model to identify any patterns in the errors. This might help you understand why the model is generating incorrect queries.

  2. If the errors are consistent and predictable, you might be able to implement a custom query checker to correct them.

  3. If the GPT-4 model consistently generates incorrect SQL queries, you might want to consider using a different model. As you mentioned, the default model does not generate these errors.

Here's the relevant code snippet from the SQLDatabaseChain class that handles the SQL query generation and execution:

checked_sql_command: str = query_checker_chain.predict(
                    callbacks=_run_manager.get_child(), **query_checker_inputs
                ).strip()
                intermediate_steps.append(
                    checked_sql_command
                )  # output: sql generation (checker)
                _run_manager.on_text(
                    checked_sql_command, color="green", verbose=self.verbose
                )
                intermediate_steps.append(
                    {"sql_cmd": checked_sql_command}
                )  # input: sql exec
                result = self.database.run(checked_sql_command)
                intermediate_steps.append(str(result))  # output: sql exec
                sql_cmd = checked_sql_command

This code generates the SQL query (checked_sql_command), executes it (result = self.database.run(checked_sql_command)), and handles any exceptions that occur.

Please note that this is just one potential solution based on the information provided. If you believe this is a bug that could impact other users, you're encouraged to make a pull request with a proposed fix. If my response contradicts your experience, I would recommend upgrading to the latest version of LangChain to ensure you have the most recent updates and bug fixes.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

keenborder786 commented 1 year ago

@nickscut , Can you please post the Traceback of your Error.

nickscut commented 1 year ago

Entering new chain... 统计EL_C1设备今天正常运行时间 SQLQuery:"SELECT SUM(value) FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today()"

DatabaseException Traceback (most recent call last) Cell In[332], line 19 13 instruction = "统计EL_C1设备今天正常运行时间"

---> 19 result = db_chain(instruction) 20 result["intermediate_steps"]

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/base.py:166, in Chain.call(self, inputs, return_only_outputs, callbacks, tags, include_run_info) 164 except (KeyboardInterrupt, Exception) as e: 165 run_manager.on_chain_error(e) --> 166 raise e 167 run_manager.on_chain_end(outputs) 168 final_outputs: Dict[str, Any] = self.prep_outputs( 169 inputs, outputs, return_only_outputs 170 )

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/base.py:160, in Chain.call(self, inputs, return_only_outputs, callbacks, tags, include_run_info) 154 run_manager = callback_manager.on_chain_start( 155 dumpd(self), 156 inputs, 157 ) 158 try: 159 outputs = ( --> 160 self._call(inputs, run_manager=run_manager) 161 if new_arg_supported 162 else self._call(inputs) 163 ) 164 except (KeyboardInterrupt, Exception) as e: 165 run_manager.on_chain_error(e)

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/sql_database/base.py:181, in SQLDatabaseChain._call(self, inputs, run_manager) 177 except Exception as exc: 178 # Append intermediate steps to exception, to aid in logging and later 179 # improvement of few shot prompt seeds 180 exc.intermediate_steps = intermediate_steps # type: ignore --> 181 raise exc

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/sql_database/base.py:126, in SQLDatabaseChain._call(self, inputs, run_manager) 122 intermediate_steps.append( 123 sql_cmd 124 ) # output: sql generation (no checker) 125 intermediate_steps.append({"sql_cmd": sql_cmd}) # input: sql exec --> 126 result = self.database.run(sql_cmd) 127 intermediate_steps.append(str(result)) # output: sql exec 128 else:

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/sql_database.py:348, in SQLDatabase.run(self, command, fetch) 346 else: 347 connection.exec_driver_sql(f"SET search_path TO {self._schema}") --> 348 cursor = connection.execute(text(command)) 349 if cursor.returns_rows: 350 if fetch == "all":

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1306, in Connection.execute(self, statement, *multiparams, **params) 1302 util.raise_( 1303 exc.ObjectNotExecutableError(statement), replace_context=err 1304 ) 1305 else: -> 1306 return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/sql/elements.py:332, in ClauseElement._execute_on_connection(self, connection, multiparams, params, execution_options, _force) 328 def _execute_on_connection( 329 self, connection, multiparams, params, execution_options, _force=False 330 ): 331 if _force or self.supports_execution: --> 332 return connection._execute_clauseelement( 333 self, multiparams, params, execution_options 334 ) 335 else: 336 raise exc.ObjectNotExecutableError(self)

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1498, in Connection._execute_clauseelement(self, elem, multiparams, params, execution_options) 1486 compiled_cache = execution_options.get( 1487 "compiled_cache", self.engine._compiled_cache 1488 ) 1490 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache( 1491 dialect=dialect, 1492 compiled_cache=compiled_cache, (...) 1496 linting=self.dialect.compiler_linting | compiler.WARN_LINTING, 1497 ) -> 1498 ret = self._execute_context( 1499 dialect, 1500 dialect.execution_ctx_cls._init_compiled, 1501 compiled_sql, 1502 distilled_params, 1503 execution_options, 1504 compiled_sql, 1505 distilled_params, 1506 elem, 1507 extracted_params, 1508 cache_hit=cache_hit, 1509 ) 1510 if has_events: 1511 self.dispatch.after_execute( 1512 self, 1513 elem, (...) 1517 ret, 1518 )

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1862, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw) 1859 branched.close() 1861 except BaseException as e: -> 1862 self._handle_dbapi_exception( 1863 e, statement, parameters, cursor, context 1864 ) 1866 return result

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:2047, in Connection._handle_dbapiexception(self, e, statement, parameters, cursor, context) 2043 util.raise( 2044 sqlalchemy_exception, with_traceback=excinfo[2], from=e 2045 ) 2046 else: -> 2047 util.raise_(exc_info[1], with_traceback=exc_info[2]) 2049 finally: 2050 del self._reentrant_error

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/util/compat.py:208, in raise_(failed resolving arguments) 205 exception.cause = replace_context 207 try: --> 208 raise exception 209 finally: 210 # credit to 211 # https://cosmicpercolator.com/2016/01/13/exception-leaks-in-python-2-and-3/ 212 # as the traceback object creates a cycle 213 del exception, replacecontext, from, with_traceback

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1819, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw) 1817 break 1818 if not evt_handled: -> 1819 self.dialect.do_execute( 1820 cursor, statement, parameters, context 1821 ) 1823 if self._has_events or self.engine._has_events: 1824 self.dispatch.after_cursor_execute( 1825 self, 1826 cursor, (...) 1830 context.executemany, 1831 )

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/base.py:416, in ClickHouseDialect.do_execute(self, cursor, statement, parameters, context) 415 def do_execute(self, cursor, statement, parameters, context=None): --> 416 cursor.execute(statement, parameters, context=context)

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/connector.py:117, in Cursor.execute(self, operation, parameters, context) 114 params = {'query_id': self._query_id} 115 response_gen = transport.execute(raw_sql, params=params) --> 117 self._process_response(response_gen) 118 self._end_query()

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/connector.py:216, in Cursor._process_response(self, response) 213 def _process_response(self, response): 214 response = iter(response) --> 216 self._columns = next(response, None) 217 self._types = next(response, None) 218 self._response = response

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/transport.py:133, in RequestsTransport.execute(self, query, params) 128 def execute(self, query, params=None): 129 """ 130 Query is returning rows and these rows should be parsed or 131 there is nothing to return. 132 """ --> 133 r = self._send(query, params=params, stream=True) 134 lines = r.iter_lines() 135 try:

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/transport.py:179, in RequestsTransport._send(self, data, params, stream) 177 orig = HTTPException(r.text) 178 orig.code = r.status_code --> 179 raise DatabaseException(orig) 180 return r

DatabaseException: Orig exception: Code: 62. DB::Exception: Syntax error: failed at position 1 ('"SELECT SUM(value) FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today()"'): "SELECT SUM(value) FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today()". Expected one of: Query, Query with output, EXPLAIN, SELECT query, possibly with UNION, list of union elements, SELECT query, subquery, possibly with UNION, SELECT subquery, SELECT query, WITH, SELECT, SHOW CREATE QUOTA query, SHOW CREATE, SHOW [TEMPORARY] TABLES|DATABASES|CLUSTERS|CLUSTER 'name' [[NOT] [I]LIKE 'str'] [LIMIT expr], SHOW, EXISTS or SHOW CREATE query, EXISTS, DESCRIBE query, DESCRIBE, DESC, SHOW PROCESSLIST query, SHOW PROCESSLIST, CREATE TABLE or ATTACH TABLE query, CREATE, ATTACH, REPLACE, CREATE DATABASE query, CREATE VIEW query, CREATE DICTIONARY, CREATE LIVE VIEW query, CREATE WINDOW VIEW query, ALTER query, ALTER TABLE, ALTER LIVE VIEW, ALTER DATABASE, RENAME query, RENAME TABLE, EXCHANGE TABLES, RENAME DICTIONARY, EXCHANGE DICTIONARIES, RENAME DATABASE, DROP query, DROP, DETACH, TRUNCATE, CHECK TABLE, KILL QUERY query, KILL, OPTIMIZE query, OPTIMIZE TABLE, WATCH query, WATCH, SHOW ACCESS query, SHOW ACCESS, ShowAccessEntitiesQuery, SHOW GRANTS query, SHOW GRANTS, SHOW PRIVILEGES query, SHOW PRIVILEGES, INSERT query, INSERT INTO, USE query, USE, SET ROLE or SET DEFAULT ROLE query, SET ROLE DEFAULT, SET ROLE, SET DEFAULT ROLE, SET query, SET, SYSTEM query, SYSTEM, CREATE USER or ALTER USER query, ALTER USER, CREATE USER, CREATE ROLE or ALTER ROLE query, ALTER ROLE, CREATE ROLE, CREATE QUOTA or ALTER QUOTA query, ALTER QUOTA, CREATE QUOTA, CREATE ROW POLICY or ALTER ROW POLICY query, ALTER POLICY, ALTER ROW POLICY, CREATE POLICY, CREATE ROW POLICY, CREATE SETTINGS PROFILE or ALTER SETTINGS PROFILE query, ALTER SETTINGS PROFILE, ALTER PROFILE, CREATE SETTINGS PROFILE, CREATE PROFILE, CREATE FUNCTION query, DROP FUNCTION query, DROP access entity query, GRANT or REVOKE query, REVOKE, GRANT, EXTERNAL DDL query, EXTERNAL DDL FROM, TCL query, BEGIN TRANSACTION, COMMIT, ROLLBACK, SET TRANSACTION SNAPSHOT, BACKUP or RESTORE query, BACKUP, RESTORE. (SYNTAX_ERROR) (version 22.5.1.2079 (official build))

nickscut commented 1 year ago

@keenborder786

nickscut commented 1 year ago

db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, use_query_checker=True, return_intermediate_steps=True, top_k=3)

if set use_query_checker=True

then Traceback of my Error @keenborder786

Entering new chain... 统计EL_C1设备今天正常运行时间 SQLQuery:The original query seems to be correct as it doesn't contain any of the common mistakes listed. Here is the reproduction of the original query:

"SELECT asset_code, value FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today() LIMIT 3"

DatabaseException Traceback (most recent call last) Cell In[334], line 19 13 instruction = "统计EL_C1设备今天正常运行时间" ---> 19 result = db_chain(instruction) 20 result["intermediate_steps"]

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/base.py:166, in Chain.call(self, inputs, return_only_outputs, callbacks, tags, include_run_info) 164 except (KeyboardInterrupt, Exception) as e: 165 run_manager.on_chain_error(e) --> 166 raise e 167 run_manager.on_chain_end(outputs) 168 final_outputs: Dict[str, Any] = self.prep_outputs( 169 inputs, outputs, return_only_outputs 170 )

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/base.py:160, in Chain.call(self, inputs, return_only_outputs, callbacks, tags, include_run_info) 154 run_manager = callback_manager.on_chain_start( 155 dumpd(self), 156 inputs, 157 ) 158 try: 159 outputs = ( --> 160 self._call(inputs, run_manager=run_manager) 161 if new_arg_supported 162 else self._call(inputs) 163 ) 164 except (KeyboardInterrupt, Exception) as e: 165 run_manager.on_chain_error(e)

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/sql_database/base.py:181, in SQLDatabaseChain._call(self, inputs, run_manager) 177 except Exception as exc: 178 # Append intermediate steps to exception, to aid in logging and later 179 # improvement of few shot prompt seeds 180 exc.intermediate_steps = intermediate_steps # type: ignore --> 181 raise exc

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/sql_database/base.py:151, in SQLDatabaseChain._call(self, inputs, run_manager) 145 _run_manager.on_text( 146 checked_sql_command, color="green", verbose=self.verbose 147 ) 148 intermediate_steps.append( 149 {"sql_cmd": checked_sql_command} 150 ) # input: sql exec --> 151 result = self.database.run(checked_sql_command) 152 intermediate_steps.append(str(result)) # output: sql exec 153 sql_cmd = checked_sql_command

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/sql_database.py:348, in SQLDatabase.run(self, command, fetch) 346 else: 347 connection.exec_driver_sql(f"SET search_path TO {self._schema}") --> 348 cursor = connection.execute(text(command)) 349 if cursor.returns_rows: 350 if fetch == "all":

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1306, in Connection.execute(self, statement, *multiparams, **params) 1302 util.raise_( 1303 exc.ObjectNotExecutableError(statement), replace_context=err 1304 ) 1305 else: -> 1306 return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/sql/elements.py:332, in ClauseElement._execute_on_connection(self, connection, multiparams, params, execution_options, _force) 328 def _execute_on_connection( 329 self, connection, multiparams, params, execution_options, _force=False 330 ): 331 if _force or self.supports_execution: --> 332 return connection._execute_clauseelement( 333 self, multiparams, params, execution_options 334 ) 335 else: 336 raise exc.ObjectNotExecutableError(self)

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1498, in Connection._execute_clauseelement(self, elem, multiparams, params, execution_options) 1486 compiled_cache = execution_options.get( 1487 "compiled_cache", self.engine._compiled_cache 1488 ) 1490 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache( 1491 dialect=dialect, 1492 compiled_cache=compiled_cache, (...) 1496 linting=self.dialect.compiler_linting | compiler.WARN_LINTING, 1497 ) -> 1498 ret = self._execute_context( 1499 dialect, 1500 dialect.execution_ctx_cls._init_compiled, 1501 compiled_sql, 1502 distilled_params, 1503 execution_options, 1504 compiled_sql, 1505 distilled_params, 1506 elem, 1507 extracted_params, 1508 cache_hit=cache_hit, 1509 ) 1510 if has_events: 1511 self.dispatch.after_execute( 1512 self, 1513 elem, (...) 1517 ret, 1518 )

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1862, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw) 1859 branched.close() 1861 except BaseException as e: -> 1862 self._handle_dbapi_exception( 1863 e, statement, parameters, cursor, context 1864 ) 1866 return result

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:2047, in Connection._handle_dbapiexception(self, e, statement, parameters, cursor, context) 2043 util.raise( 2044 sqlalchemy_exception, with_traceback=excinfo[2], from=e 2045 ) 2046 else: -> 2047 util.raise_(exc_info[1], with_traceback=exc_info[2]) 2049 finally: 2050 del self._reentrant_error

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/util/compat.py:208, in raise_(failed resolving arguments) 205 exception.cause = replace_context 207 try: --> 208 raise exception 209 finally: 210 # credit to 211 # https://cosmicpercolator.com/2016/01/13/exception-leaks-in-python-2-and-3/ 212 # as the traceback object creates a cycle 213 del exception, replacecontext, from, with_traceback

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1819, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw) 1817 break 1818 if not evt_handled: -> 1819 self.dialect.do_execute( 1820 cursor, statement, parameters, context 1821 ) 1823 if self._has_events or self.engine._has_events: 1824 self.dispatch.after_cursor_execute( 1825 self, 1826 cursor, (...) 1830 context.executemany, 1831 )

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/base.py:416, in ClickHouseDialect.do_execute(self, cursor, statement, parameters, context) 415 def do_execute(self, cursor, statement, parameters, context=None): --> 416 cursor.execute(statement, parameters, context=context)

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/connector.py:117, in Cursor.execute(self, operation, parameters, context) 114 params = {'query_id': self._query_id} 115 response_gen = transport.execute(raw_sql, params=params) --> 117 self._process_response(response_gen) 118 self._end_query()

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/connector.py:216, in Cursor._process_response(self, response) 213 def _process_response(self, response): 214 response = iter(response) --> 216 self._columns = next(response, None) 217 self._types = next(response, None) 218 self._response = response

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/transport.py:133, in RequestsTransport.execute(self, query, params) 128 def execute(self, query, params=None): 129 """ 130 Query is returning rows and these rows should be parsed or 131 there is nothing to return. 132 """ --> 133 r = self._send(query, params=params, stream=True) 134 lines = r.iter_lines() 135 try:

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/transport.py:179, in RequestsTransport._send(self, data, params, stream) 177 orig = HTTPException(r.text) 178 orig.code = r.status_code --> 179 raise DatabaseException(orig) 180 return r

DatabaseException: Orig exception: Code: 62. DB::Exception: Syntax error: failed at position 1 ('The') (line 1, col 1): The original query seems to be correct as it doesn't contain any of the common mistakes listed. Here is the reproduction of the original query:

"SELECT `asset_. Expected one of: Query, Query with output, EXPLAIN, SELECT query, possibly with UNION, list of union elements, SELECT query, subquery, possibly with UNION, SELECT subquery, SELECT query, WITH, SELECT, SHOW CREATE QUOTA query, SHOW CREATE, SHOW [TEMPORARY] TABLES|DATABASES|CLUSTERS|CLUSTER 'name' [[NOT] [I]LIKE 'str'] [LIMIT expr], SHOW, EXISTS or SHOW CREATE query, EXISTS, DESCRIBE query, DESCRIBE, DESC, SHOW PROCESSLIST query, SHOW PROCESSLIST, CREATE TABLE or ATTACH TABLE query, CREATE, ATTACH, REPLACE, CREATE DATABASE query, CREATE VIEW query, CREATE DICTIONARY, CREATE LIVE VIEW query, CREATE WINDOW VIEW query, ALTER query, ALTER TABLE, ALTER LIVE VIEW, ALTER DATABASE, RENAME query, RENAME TABLE, EXCHANGE TABLES, RENAME DICTIONARY, EXCHANGE DICTIONARIES, RENAME DATABASE, DROP query, DROP, DETACH, TRUNCATE, CHECK TABLE, KILL QUERY query, KILL, OPTIMIZE query, OPTIMIZE TABLE, WATCH query, WATCH, SHOW ACCESS query, SHOW ACCESS, ShowAccessEntitiesQuery, SHOW GRANTS query, SHOW GRANTS, SHOW PRIVILEGES query, SHOW PRIVILEGES, INSERT query, INSERT INTO, USE query, USE, SET ROLE or SET DEFAULT ROLE query, SET ROLE DEFAULT, SET ROLE, SET DEFAULT ROLE, SET query, SET, SYSTEM query, SYSTEM, CREATE USER or ALTER USER query, ALTER USER, CREATE USER, CREATE ROLE or ALTER ROLE query, ALTER ROLE, CREATE ROLE, CREATE QUOTA or ALTER QUOTA query, ALTER QUOTA, CREATE QUOTA, CREATE ROW POLICY or ALTER ROW POLICY query, ALTER POLICY, ALTER ROW POLICY, CREATE POLICY, CREATE ROW POLICY, CREATE SETTINGS PROFILE or ALTER SETTINGS PROFILE query, ALTER SETTINGS PROFILE, ALTER PROFILE, CREATE SETTINGS PROFILE, CREATE PROFILE, CREATE FUNCTION query, DROP FUNCTION query, DROP access entity query, GRANT or REVOKE query, REVOKE, GRANT, EXTERNAL DDL query, EXTERNAL DDL FROM, TCL query, BEGIN TRANSACTION, COMMIT, ROLLBACK, SET TRANSACTION SNAPSHOT, BACKUP or RESTORE query, BACKUP, RESTORE. (SYNTAX_ERROR) (version 22.5.1.2079 (official build))

nickscut commented 1 year ago

I have tried to solve this problem

modify:langchain/chains/sql_database/base.py

@staticmethod
def extract_sql_command(text: str) -> str:
    # Create a regular expression pattern that matches the SQL command
    pattern = r"(SELECT[\s\S]*LIMIT \d+)"

    # Use the re.search function to search the text for the pattern
    match = re.search(pattern, text, re.IGNORECASE)

    # If a match was found, return the matched string
    if match:
        return match.group(1).strip()

    # If no match was found, return the original text
    return text

checked_sql_command: str = self.extract_sql_command(query_checker_chain.predict( callbacks=_run_manager.get_child(), **query_checker_inputs ).strip())

it works fine for gpt4 model

keenborder786 commented 1 year ago

@nickscut thank you will have a look

yuan-thomas commented 1 year ago

@nickscut thank you will have a look

Had a look in the past few days. Basically it's the disparity between completion models and chat models. Whilst completion models such as da-vinci simply reproduces the SQL query following instruction, chat models like gpt-3.5 or gpt-4 tend to add comments upfront to explain the result.

I have tried to tune tools/sqldatabase/prompt.py to suppress the excessive output, but in vain. Regardless how I changed, chat models always add something before the SQL.

So to me, it seems one of the following options:

Erickrus commented 1 year ago

I have a robust version of extract_sql_command(...) for SQL extraction from any sql and text mixture. If pass return_all=True, you can get all valid sql found in the given text. This method support wide ranges of sql statement including DML, DDL etc. Limitation, if there's anything like {var} or {{var}} in the sql, the parser will stop, only extract valid part of sql before this kind of error.

#!pip install -q sqlfluff>=2.1.3
#!pip install -q sqlparse>=0.4.4
import re
import sqlfluff
import sqlparse

@staticmethod
def extract_sql_command(
    text: str, 
    dialect: str='ansi', 
    return_all: bool=False
    ) -> str:
    # concatenate res from sqlfluff.parse
    def concat_dict(dict_data: dict) -> str:
        res = ""
        for value in dict_data.values():
            if isinstance(value, dict):
                res += concat_dict(value)
            elif isinstance(value, list) or isinstance(value, tuple):
                for i in range(len(value)):
                    res += concat_dict(value[i])
            else:
                res += str(value)
        return res

    # regex to find valid check positions
    def get_check_positions(text: str) -> list[int]:
        sql_keywords = [
            'WITH', # CTE
            'SELECT', 'INSERT', 'DELETE', 'UPDATE', 'UPSERT', 'REPLACE', 'MERGE', 'COMMIT', 'ROLLBACK', 'START', # DML
            'DROP', 'CREATE', 'ALTER', # DDL
        ]
        keyword_positions = []
        for keyword in sql_keywords:
            sql_pattern = re.compile(re.escape(keyword), re.IGNORECASE)
            for match in sql_pattern.finditer(text):
                keyword_positions.append(match.start())
        keyword_positions = sorted(keyword_positions)
        return keyword_positions

    res = []
    extract_pos = 0
    token_types = sqlparse.tokens._TokenType()
    error_line_pattern = r"Line (\d+), Position (\d+): (.*)"
    keyword_positions = get_check_positions(text)

    while extract_pos < len(text)-2:
        if extract_pos in keyword_positions:
            # check the text character by character
            current_statement = text[extract_pos:]
            parsed = sqlparse.parse(current_statement)[0]
            # filter based on the token type only allow DML, DDL, CTE to continue
            if parsed.tokens[0].ttype in (token_types.Keyword.DML, token_types.Keyword.DDL, token_types.Keyword.CTE):
                # try to use sqlfluff to check the syntax
                candidate_sql = ""
                current_lines = current_statement.splitlines()
                try:
                    sqlfluff.parse(current_statement, dialect=dialect)
                    candidate_sql = current_statement
                except sqlfluff.api.simple.APIParsingError as e:
                    # find out first violation's line and position
                    # parse the error message
                    msg_lines = str(e.msg).split("\n")
                    for j in range(len(msg_lines)):
                        if re.search(error_line_pattern, msg_lines[j]):
                            line_no, position = str(msg_lines[j]).split(":")[0].split(",")
                            line_no, position = int(line_no.strip().split(' ')[1]), int(position.strip().split(' ')[1])
                            # extract the candidate sql
                            for k in range(line_no-1):
                                candidate_sql += current_lines[k] + "\n"
                            candidate_sql += current_lines[line_no-1][:position-1]
                            break
                # for shorter statement, use length filter
                try:
                    parsed_res = sqlfluff.parse(candidate_sql, dialect=dialect)
                    # retrieve the first sql, sometimes it returns multiple sql separated by ;
                    if isinstance(parsed_res['file'], list):
                        candidate_sql = concat_dict(parsed_res['file'][0])
                    parsed = sqlparse.parse(candidate_sql)[0]
                    if len(candidate_sql.split()) <= 3 and parsed.tokens[0].ttype in (token_types.Keyword.DML):
                        pass
                    elif len(candidate_sql.split()) <= 2:
                        pass
                    else:
                        extract_pos += len(candidate_sql) - 1
                        res.append(candidate_sql)
                except Exception as e:
                    pass
        extract_pos += 1
    if return_all:
        return res # return all valid
    if len(res)==0:
        return ""
    else:
        return max(res, key=len) # return the longest
dosubot[bot] commented 9 months ago

Hi, @nickscut! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding of the issue, you reported an error related to the GPT4 model in the LangChain framework when executing SQL queries using the SQLDatabaseChain. It seems that the issue lies in the SQL query generation by the GPT4 model, as the default model does not generate these errors. There have been suggestions to review the SQL queries generated by the GPT4 model, implement a custom query checker, or consider using a different model. Some proposed code modifications have also been discussed.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your contribution, and we appreciate your understanding as we work to manage our backlog effectively. Let us know if you have any further questions or concerns.

JUNNIEJUN commented 1 month ago

Hi @nickscut, I hope it's not to late to share my methods. I'm also doing a projet using SQLDatabaseChain to create a chatbot to execute SQL query with the model gpt4.

I faced these problems, too. The way how I solved these problems is just specify the prompt.

Actually, SQLDatabaseChain does no longer accept the parameter of 'prompt'. But we can still define a template and use format.() or just use f-string to form the question. In the prompt of SQLDatabaseChain, You can ask the model to generate a SQL Query without any decoration (for example, Markdown format).

And for the problem that the model generate the general text, which will pass as a SQL query. You can give the description of table (description of each column). And ask the model to generate the SQL query only base on this description. And then the model will generate a correct query tothe chain, which can return correct results.

This is the way I found easiest. I hope it can help you and others who may have the similar problem.