run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.23k stars 4.95k forks source link

When database has schemas, llama_index always reverts to "dbo" inside base.py #7070

Closed krish240574 closed 1 year ago

krish240574 commented 1 year ago

Bug Description

Hi, I'm trying to create a GPTSQLStructStoreIndex object for a MS-SQL database that has two schemas: "sales" and "production". Whenever I create the GPTSQLStructStoreIndex, I specify the sql_database object as a SQLDatabase(engine, schema="sales"). However, during query time, llama_index always reverts to the schema as "dbo" and reports my table names as not found. Any clues as to why this is happening? Thanks very much

Version

0.7.13

Steps to Reproduce

def get_table_names(engine):

# Create a SQLAlchemy metadata object to store database schema information
metadata = MetaData()

# Reflect all tables from the database into the metadata object
metadata.reflect(bind=engine, schema="sales")

# Get the list of table names from the metadata object
table_names = metadata.tables.keys()

return list(table_names)
Create a database in MS-SQL server with schemas "sales" and "production".
Create tables sales.stores and sales.staffs such as :

CREATE TABLE sales.stores ( store_id INT IDENTITY (1, 1) PRIMARY KEY, store_name VARCHAR (255) NOT NULL, phone VARCHAR (25), email VARCHAR (255), street VARCHAR (255), city VARCHAR (255), state VARCHAR (10), zip_code VARCHAR (5) );

CREATE TABLE sales.staffs ( staff_id INT IDENTITY (1, 1) PRIMARY KEY, first_name VARCHAR (50) NOT NULL, last_name VARCHAR (50) NOT NULL, email VARCHAR (255) NOT NULL UNIQUE, phone VARCHAR (25), active tinyint NOT NULL, store_id INT NOT NULL, manager_id INT, FOREIGN KEY (store_id) REFERENCES sales.stores (store_id) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (manager_id) REFERENCES sales.staffs (staff_id) ON DELETE NO ACTION ON UPDATE NO ACTION );

engine = create_engine(f"mssql+pymssql://{user}:{password}@{host}/{dbname}")

table_names = get_table_names(engine)

Now create a GPTSQLStructStoreIndex object like : index = GPTSQLStructStoreIndex( [], sql_database=SQLDatabase(engine, schema="sales"), table_name=table_names, service_context=service_context,
)

Relevant Logs/Tracbacks

File "/home/dbuser/dbuser1/lib/python3.10/site-packages/llama_index/indices/struct_store/container_builder.py", line 57, in __init__
    self.full_context_dict = self._build_context_from_sql_database(
  File "/home/dbuser/dbuser1/lib/python3.10/site-packages/llama_index/indices/struct_store/container_builder.py", line 87, in _build_context_from_sql_database
    table_desc = sql_database.get_single_table_info(table_name)
  File "/home/dbuser/dbuser1/lib/python3.10/site-packages/llama_index/langchain_helpers/sql_wrapper.py", line 56, in get_single_table_info
    for foreign_key in self._inspector.get_foreign_keys(table_name):
  File "/home/dbuser/dbuser1/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 1047, in get_foreign_keys
    return self.dialect.get_foreign_keys(
  File "<string>", line 2, in get_foreign_keys
  File "/home/dbuser/dbuser1/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 97, in cache
    ret = fn(self, con, *args, **kw)
  File "/home/dbuser/dbuser1/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/base.py", line 2891, in wrap
    return _switch_db(
  File "/home/dbuser/dbuser1/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/base.py", line 2915, in _switch_db
    return fn(*arg, **kw)
  File "/home/dbuser/dbuser1/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/base.py", line 4041, in get_foreign_keys
    return self._default_or_error(
  File "/home/dbuser/dbuser1/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/base.py", line 3416, in _default_or_error
    raise exc.NoSuchTableError(f"{owner}.{tablename}")
sqlalchemy.exc.NoSuchTableError: dbo.customers
Disiok commented 1 year ago

From the trace, it looks like the error is in sqlalchemy? Are you sure you are able to properly access the table?

krish240574 commented 1 year ago

Possibly an error in sqlalchemy, let me try to reproduce it with a postgres schema as well. Then that narrows down the case. Thanks for the reply.

when you ask, "properly access the table", did you mean from other code? Yes, i was able to get table names, using Metadata and reflection, like in the following code:

Create a SQLAlchemy metadata object to store database schema information

metadata = MetaData()

Reflect all tables from the database into the metadata object

metadata.reflect(bind=engine, schema="sales")

Get the list of table names from the metadata object

table_names = metadata.tables.keys()

return list(table_names)

krish240574 commented 1 year ago

With postgres, here is what I get :(Need to RTFM, I guess...)

I've tried the same code with no schemas, works perfectly.

Created a database "newdb" Created a schema "sales" Created a table sales.orders sql_database = SQLDatabase(engine, schema="sales")

Create index from database and context container

context_container = SQLContextContainerBuilder(sql_database) index = GPTSQLStructStoreIndex( [], sql_database=sql_database, table_name=table_names, service_context=service_context, sql_context_container=context_container,

) query_engine = index.as_query_engine() query_engine = index.as_query_engine()

NoSuchTableError Traceback (most recent call last)

File ~/miniconda3/lib/python3.11/site-packages/llama_index/indices/struct_store/container_builder.py:57, in SQLContextContainerBuilder.init(self, sql_database, context_dict, context_str) 55 self.context_dict = context_dict or {} 56 # build full context from sql_database ---> 57 self.full_context_dict = self._build_context_from_sql_database( 58 self.sql_database, current_context=self.context_dict 59 ) 60 self.context_str = context_str

File ~/miniconda3/lib/python3.11/site-packages/llama_index/indices/struct_store/container_builder.py:85, in SQLContextContainerBuilder._build_context_from_sql_database(self, sql_database, current_context) 83 result_context = {} 84 for table_name in sql_database.get_usable_table_names(): ---> 85 table_desc = sql_database.get_single_table_info(table_name) 86 table_text = f"Schema of table {table_name}:\n" f"{table_desc}\n" 87 if table_name in current_context:

File ~/miniconda3/lib/python3.11/site-packages/llama_index/langchain_helpers/sql_wrapper.py:52, in SQLDatabase.get_single_table_info(self, table_name) 47 template = ( 48 "Table '{table_name}' has columns: {columns} " 49 "and foreign keys: {foreign_keys}." 50 ) 51 columns = [] ---> 52 for column in self._inspector.get_columns(table_name): 53 columns.append(f"{column['name']} ({str(column['type'])})") 54 column_str = ", ".join(columns)

File ~/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/reflection.py:859, in Inspector.get_columns(self, table_name, schema, kw) 835 r"""Return information about columns in table_name. 836 837 Given a string table_name and an optional string schema, (...) 855 856 """ 858 with self._operation_context() as conn: --> 859 col_defs = self.dialect.get_columns( 860 conn, table_name, schema, info_cache=self.info_cache, kw 861 ) 862 if col_defs: 863 self._instantiate_types([col_defs])

File :2, in get_columns(self, connection, table_name, schema, **kw)

File ~/miniconda3/lib/python3.11/site-packages/sqlalchemy/engine/reflection.py:97, in cache(fn, self, con, *args, *kw) 95 ret: _R = info_cache.get(key) 96 if ret is None: ---> 97 ret = fn(self, con, args, **kw) 98 info_cache[key] = ret 99 return ret

File ~/miniconda3/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/base.py:3481, in PGDialect.get_columns(self, connection, table_name, schema, kw) 3471 @reflection.cache 3472 def get_columns(self, connection, table_name, schema=None, kw): 3473 data = self.get_multi_columns( 3474 connection, 3475 schema=schema, (...) 3479 **kw, 3480 ) -> 3481 return self._value_or_raise(data, table_name, schema)

File ~/miniconda3/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/base.py:3449, in PGDialect._value_or_raise(self, data, table, schema) 3447 return dict(data)[(schema, table)] 3448 except KeyError: -> 3449 raise exc.NoSuchTableError( 3450 f"{schema}.{table}" if schema else table 3451 ) from None

NoSuchTableError: orders

krish240574 commented 1 year ago

Progress !!! dbschema='sales,production' # Searches left-to-right, add all schemas as comma-separated engine = create_engine(f"postgresql+psycopg2://user:password@localhost/newdb", connect_args={'options': '-csearch_path={}'.format(dbschema)})

This seems to do the trick, for postgres. Now onto the beast called MSSQL.

This is the life-saving post - https://stackoverflow.com/questions/9298296/sqlalchemy-support-of-postgres-schemas

krish240574 commented 1 year ago

More progress : engine = create_engine(f"mssql+pymssql://{user}:{password}@{host}/{dbname}") schema_translate_map = { None: 'sales', # If no schema specified, use 'sales' 'production': 'production' # Use 'production' schema for tables with 'production' schema prefix } # Set the schema_translate_map using execution_options engine = engine.execution_options(schema_translate_map=schema_translate_map)

This works for MSSQL, and the SQLDatabase object needs to be created simply as : SQDatabase(engine)

krish240574 commented 1 year ago

Not a llama index bug, just need to use the right sqlalchemy options.