SQLDatabaseChain runs under the GPT4 model and reports an error

nickscut commented 1 year ago

System Info

my python code

from langchain import OpenAI, SQLDatabase, SQLDatabaseChain from langchain.chains import SQLDatabaseSequentialChain

db = SQLDatabase.from_uri("clickhouse://xx:xx@ip/db", include_tables=include_tables, custom_table_info=custom_table_schemas, sample_rows_in_table_info=2)

llm = OpenAI(temperature=0, model_name="gpt-4-0613", verbose=True, streaming=True, openai_api_base="https://xxx.cn/v1")

db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, use_query_checker=True, return_intermediate_steps=True, top_k=3)

instruction = "Statistics EL_C1 device uptime today" result = db_chain(instruction) result["intermediate_steps"]

run result

DatabaseException: Orig exception: Code: 62. DB::Exception: Syntax error: failed at position 1 ('The') (line 1, col 1): The original query seems to be correct as it doesn't have any of the common mistakes mentioned. Here is the reproduction of the original query:

I don't know why these strings are being run as SQL

if set use_query_checker = False

There is an extra double quote in SQL

DatabaseException: Orig exception: Code: 62. DB::Exception: Syntax error: failed at position 1 ('"SELECT SUM(value) FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today()"'): "SELECT SUM(value) FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today()".

All these errors are only generated under the GPT4 model. If the default model is used, no errors are generated

Who can help?

No response

Information

[ ] The official example notebooks/scripts
[ ] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[X] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

llm = OpenAI(temperature=0, model_name="gpt-4-0613", verbose=True, streaming=True, openai_api_base="https://xxx.cn/v1")

db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, use_query_checker=False, return_intermediate_steps=True, top_k=3)

As long as the GPT4 model is used, there will be problems through the db chain, and the string returned by the model will be executed as SQL

Expected behavior

The hope is to generate correct SQL execution, do not treat the string returned by the model as SQL execution

dosubot[bot] commented 1 year ago

Answer generated by a 🤖

Answer

From your description, it appears that you're encountering issues with the GPT-4 model generating incorrect SQL queries when used with the SQLDatabaseChain class in the LangChain framework. When the use_query_checker attribute is set to True, the SQL query generated by the language model is checked and potentially corrected before execution. However, if the generated SQL query is incorrect and cannot be corrected, an exception is raised.

In your case, the GPT-4 model seems to be generating SQL queries that are incorrect and cannot be corrected by the query checker, hence the exceptions. When you set use_query_checker to False, the incorrect SQL query is executed directly, leading to a syntax error.

Unfortunately, I can't see the images or links you've provided as I'm an AI and can only process text. However, based on your description, here are a few suggestions that might help:

Review the SQL queries being generated by the GPT-4 model to identify any patterns in the errors. This might help you understand why the model is generating incorrect queries.
If the errors are consistent and predictable, you might be able to implement a custom query checker to correct them.
If the GPT-4 model consistently generates incorrect SQL queries, you might want to consider using a different model. As you mentioned, the default model does not generate these errors.

Here's the relevant code snippet from the SQLDatabaseChain class that handles the SQL query generation and execution:

checked_sql_command: str = query_checker_chain.predict(
                    callbacks=_run_manager.get_child(), **query_checker_inputs
                ).strip()
                intermediate_steps.append(
                    checked_sql_command
                )  # output: sql generation (checker)
                _run_manager.on_text(
                    checked_sql_command, color="green", verbose=self.verbose
                )
                intermediate_steps.append(
                    {"sql_cmd": checked_sql_command}
                )  # input: sql exec
                result = self.database.run(checked_sql_command)
                intermediate_steps.append(str(result))  # output: sql exec
                sql_cmd = checked_sql_command

This code generates the SQL query (checked_sql_command), executes it (result = self.database.run(checked_sql_command)), and handles any exceptions that occur.

Please note that this is just one potential solution based on the information provided. If you believe this is a bug that could impact other users, you're encouraged to make a pull request with a proposed fix. If my response contradicts your experience, I would recommend upgrading to the latest version of LangChain to ensure you have the most recent updates and bug fixes.

Sources

langchain/chains/sql_database/base.py

This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

keenborder786 commented 1 year ago

@nickscut , Can you please post the Traceback of your Error.

nickscut commented 1 year ago

Entering new chain... 统计EL_C1设备今天正常运行时间 SQLQuery:"SELECT SUM(value) FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today()"

DatabaseException Traceback (most recent call last) Cell In[332], line 19 13 instruction = "统计EL_C1设备今天正常运行时间"

---> 19 result = db_chain(instruction) 20 result["intermediate_steps"]

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/base.py:166, in Chain.call(self, inputs, return_only_outputs, callbacks, tags, include_run_info) 164 except (KeyboardInterrupt, Exception) as e: 165 run_manager.on_chain_error(e) --> 166 raise e 167 run_manager.on_chain_end(outputs) 168 final_outputs: Dict[str, Any] = self.prep_outputs( 169 inputs, outputs, return_only_outputs 170 )

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/base.py:160, in Chain.call(self, inputs, return_only_outputs, callbacks, tags, include_run_info) 154 run_manager = callback_manager.on_chain_start( 155 dumpd(self), 156 inputs, 157 ) 158 try: 159 outputs = ( --> 160 self._call(inputs, run_manager=run_manager) 161 if new_arg_supported 162 else self._call(inputs) 163 ) 164 except (KeyboardInterrupt, Exception) as e: 165 run_manager.on_chain_error(e)

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/sql_database/base.py:181, in SQLDatabaseChain._call(self, inputs, run_manager) 177 except Exception as exc: 178 # Append intermediate steps to exception, to aid in logging and later 179 # improvement of few shot prompt seeds 180 exc.intermediate_steps = intermediate_steps # type: ignore --> 181 raise exc

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/sql_database/base.py:126, in SQLDatabaseChain._call(self, inputs, run_manager) 122 intermediate_steps.append( 123 sql_cmd 124 ) # output: sql generation (no checker) 125 intermediate_steps.append({"sql_cmd": sql_cmd}) # input: sql exec --> 126 result = self.database.run(sql_cmd) 127 intermediate_steps.append(str(result)) # output: sql exec 128 else:

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/sql_database.py:348, in SQLDatabase.run(self, command, fetch) 346 else: 347 connection.exec_driver_sql(f"SET search_path TO {self._schema}") --> 348 cursor = connection.execute(text(command)) 349 if cursor.returns_rows: 350 if fetch == "all":

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1306, in Connection.execute(self, statement, *multiparams, **params) 1302 util.raise_( 1303 exc.ObjectNotExecutableError(statement), replace_context=err 1304 ) 1305 else: -> 1306 return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/sql/elements.py:332, in ClauseElement._execute_on_connection(self, connection, multiparams, params, execution_options, _force) 328 def _execute_on_connection( 329 self, connection, multiparams, params, execution_options, _force=False 330 ): 331 if _force or self.supports_execution: --> 332 return connection._execute_clauseelement( 333 self, multiparams, params, execution_options 334 ) 335 else: 336 raise exc.ObjectNotExecutableError(self)

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1498, in Connection._execute_clauseelement(self, elem, multiparams, params, execution_options) 1486 compiled_cache = execution_options.get( 1487 "compiled_cache", self.engine._compiled_cache 1488 ) 1490 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache( 1491 dialect=dialect, 1492 compiled_cache=compiled_cache, (...) 1496 linting=self.dialect.compiler_linting | compiler.WARN_LINTING, 1497 ) -> 1498 ret = self._execute_context( 1499 dialect, 1500 dialect.execution_ctx_cls._init_compiled, 1501 compiled_sql, 1502 distilled_params, 1503 execution_options, 1504 compiled_sql, 1505 distilled_params, 1506 elem, 1507 extracted_params, 1508 cache_hit=cache_hit, 1509 ) 1510 if has_events: 1511 self.dispatch.after_execute( 1512 self, 1513 elem, (...) 1517 ret, 1518 )

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1862, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw) 1859 branched.close() 1861 except BaseException as e: -> 1862 self._handle_dbapi_exception( 1863 e, statement, parameters, cursor, context 1864 ) 1866 return result

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:2047, in Connection._handle_dbapiexception(self, e, statement, parameters, cursor, context) 2043 util.raise( 2044 sqlalchemy_exception, with_traceback=excinfo[2], from=e 2045 ) 2046 else: -> 2047 util.raise_(exc_info[1], with_traceback=exc_info[2]) 2049 finally: 2050 del self._reentrant_error

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/util/compat.py:208, in raise_(failed resolving arguments) 205 exception.cause = replace_context 207 try: --> 208 raise exception 209 finally: 210 # credit to 211 # https://cosmicpercolator.com/2016/01/13/exception-leaks-in-python-2-and-3/ 212 # as the traceback object creates a cycle 213 del exception, replacecontext, from, with_traceback

File ~/opt/anaconda3/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1819, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw) 1817 break 1818 if not evt_handled: -> 1819 self.dialect.do_execute( 1820 cursor, statement, parameters, context 1821 ) 1823 if self._has_events or self.engine._has_events: 1824 self.dispatch.after_cursor_execute( 1825 self, 1826 cursor, (...) 1830 context.executemany, 1831 )

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/base.py:416, in ClickHouseDialect.do_execute(self, cursor, statement, parameters, context) 415 def do_execute(self, cursor, statement, parameters, context=None): --> 416 cursor.execute(statement, parameters, context=context)

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/connector.py:117, in Cursor.execute(self, operation, parameters, context) 114 params = {'query_id': self._query_id} 115 response_gen = transport.execute(raw_sql, params=params) --> 117 self._process_response(response_gen) 118 self._end_query()

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/connector.py:216, in Cursor._process_response(self, response) 213 def _process_response(self, response): 214 response = iter(response) --> 216 self._columns = next(response, None) 217 self._types = next(response, None) 218 self._response = response

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/transport.py:133, in RequestsTransport.execute(self, query, params) 128 def execute(self, query, params=None): 129 """ 130 Query is returning rows and these rows should be parsed or 131 there is nothing to return. 132 """ --> 133 r = self._send(query, params=params, stream=True) 134 lines = r.iter_lines() 135 try:

File ~/opt/anaconda3/lib/python3.9/site-packages/clickhouse_sqlalchemy/drivers/http/transport.py:179, in RequestsTransport._send(self, data, params, stream) 177 orig = HTTPException(r.text) 178 orig.code = r.status_code --> 179 raise DatabaseException(orig) 180 return r

DatabaseException: Orig exception: Code: 62. DB::Exception: Syntax error: failed at position 1 ('"SELECT SUM(value) FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today()"'): "SELECT SUM(value) FROM idap_asset.EL_MODEL_Run_Time WHERE asset_code = 'EL_C1' AND toDate(CAST(window_end / 1000, 'DateTime')) = today()". Expected one of: Query, Query with output, EXPLAIN, SELECT query, possibly with UNION, list of union elements, SELECT query, subquery, possibly with UNION, SELECT subquery, SELECT query, WITH, SELECT, SHOW CREATE QUOTA query, SHOW CREATE, SHOW [TEMPORARY] TABLES|DATABASES|CLUSTERS|CLUSTER 'name' [[NOT] [I]LIKE 'str'] [LIMIT expr], SHOW, EXISTS or SHOW CREATE query, EXISTS, DESCRIBE query, DESCRIBE, DESC, SHOW PROCESSLIST query, SHOW PROCESSLIST, CREATE TABLE or ATTACH TABLE query, CREATE, ATTACH, REPLACE, CREATE DATABASE query, CREATE VIEW query, CREATE DICTIONARY, CREATE LIVE VIEW query, CREATE WINDOW VIEW query, ALTER query, ALTER TABLE, ALTER LIVE VIEW, ALTER DATABASE, RENAME query, RENAME TABLE, EXCHANGE TABLES, RENAME DICTIONARY, EXCHANGE DICTIONARIES, RENAME DATABASE, DROP query, DROP, DETACH, TRUNCATE, CHECK TABLE, KILL QUERY query, KILL, OPTIMIZE query, OPTIMIZE TABLE, WATCH query, WATCH, SHOW ACCESS query, SHOW ACCESS, ShowAccessEntitiesQuery, SHOW GRANTS query, SHOW GRANTS, SHOW PRIVILEGES query, SHOW PRIVILEGES, INSERT query, INSERT INTO, USE query, USE, SET ROLE or SET DEFAULT ROLE query, SET ROLE DEFAULT, SET ROLE, SET DEFAULT ROLE, SET query, SET, SYSTEM query, SYSTEM, CREATE USER or ALTER USER query, ALTER USER, CREATE USER, CREATE ROLE or ALTER ROLE query, ALTER ROLE, CREATE ROLE, CREATE QUOTA or ALTER QUOTA query, ALTER QUOTA, CREATE QUOTA, CREATE ROW POLICY or ALTER ROW POLICY query, ALTER POLICY, ALTER ROW POLICY, CREATE POLICY, CREATE ROW POLICY, CREATE SETTINGS PROFILE or ALTER SETTINGS PROFILE query, ALTER SETTINGS PROFILE, ALTER PROFILE, CREATE SETTINGS PROFILE, CREATE PROFILE, CREATE FUNCTION query, DROP FUNCTION query, DROP access entity query, GRANT or REVOKE query, REVOKE, GRANT, EXTERNAL DDL query, EXTERNAL DDL FROM, TCL query, BEGIN TRANSACTION, COMMIT, ROLLBACK, SET TRANSACTION SNAPSHOT, BACKUP or RESTORE query, BACKUP, RESTORE. (SYNTAX_ERROR) (version 22.5.1.2079 (official build))

nickscut commented 1 year ago

@keenborder786

nickscut commented 1 year ago

db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True, use_query_checker=True, return_intermediate_steps=True, top_k=3)

if set use_query_checker=True

then Traceback of my Error @keenborder786

Entering new chain... 统计EL_C1设备今天正常运行时间 SQLQuery:The original query seems to be correct as it doesn't contain any of the common mistakes listed. Here is the reproduction of the original query:

"SELECT `asset_code`, `value` FROM idap_asset.EL_MODEL_Run_Time WHERE `asset_code` = 'EL_C1' AND toDate(CAST(`window_end` / 1000, 'DateTime')) = today() LIMIT 3"

DatabaseException Traceback (most recent call last) Cell In[334], line 19 13 instruction = "统计EL_C1设备今天正常运行时间" ---> 19 result = db_chain(instruction) 20 result["intermediate_steps"]

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/base.py:166, in Chain.call(self, inputs, return_only_outputs, callbacks, tags, include_run_info) 164 except (KeyboardInterrupt, Exception) as e: 165 run_manager.on_chain_error(e) --> 166 raise e 167 run_manager.on_chain_end(outputs) 168 final_outputs: Dict[str, Any] = self.prep_outputs( 169 inputs, outputs, return_only_outputs 170 )

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/base.py:160, in Chain.call(self, inputs, return_only_outputs, callbacks, tags, include_run_info) 154 run_manager = callback_manager.on_chain_start( 155 dumpd(self), 156 inputs, 157 ) 158 try: 159 outputs = ( --> 160 self._call(inputs, run_manager=run_manager) 161 if new_arg_supported 162 else self._call(inputs) 163 ) 164 except (KeyboardInterrupt, Exception) as e: 165 run_manager.on_chain_error(e)

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/sql_database/base.py:181, in SQLDatabaseChain._call(self, inputs, run_manager) 177 except Exception as exc: 178 # Append intermediate steps to exception, to aid in logging and later 179 # improvement of few shot prompt seeds 180 exc.intermediate_steps = intermediate_steps # type: ignore --> 181 raise exc

File ~/opt/anaconda3/lib/python3.9/site-packages/langchain/chains/sql_database/base.py:151, in SQLDatabaseChain._call(self, inputs, run_manager) 145 _run_manager.on_text( 146 checked_sql_command, color="green", verbose=self.verbose 147 ) 148 intermediate_steps.append( 149 {"sql_cmd": checked_sql_command} 150 ) # input: sql exec --> 151 result = self.database.run(checked_sql_command) 152 intermediate_steps.append(str(result)) # output: sql exec 153 sql_cmd = checked_sql_command