run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.67k stars 4.73k forks source link

[Documentation]: Advanced Text-to-SQL guide does not add other tables #8088

Closed AbhiPawar5 closed 2 months ago

AbhiPawar5 commented 9 months ago

Documentation Issue Description

In the guide, there is no code to add the other tables except city_stats in the SQL database. And the guide mentions it can search on large number of tables to get the SQL from natural query. However, we see only a single table is added to the SQL DB instance in the following line:

sql_database = SQLDatabase(engine, include_tables=["city_stats"])

Can you please post the correct code before updating the documentation? Thanks. P.S New to Llama Index.

Documentation Link

https://gpt-index.readthedocs.io/en/latest/examples/index_structs/struct_indices/duckdb_sql_query.html#advanced-text-to-sql-with-our-sqltableretrieverqueryengine

dosubot[bot] commented 9 months ago

πŸ€–

Hello,

Thank you for bringing this to our attention. You're correct that the guide currently only shows how to add a single table to the SQL database. However, you can add multiple tables by creating a new Table object for each table you want to add, and then adding them to the MetaData object using metadata_obj.create_all(engine). Here's an example of how you can add two tables, city_stats and country_stats:

# create city SQL table
table_name1 = "city_stats"
city_stats_table = Table(
    table_name1,
    metadata_obj,
    Column("city_name", String(16), primary_key=True),
    Column("population", Integer),
    Column("country", String(16), nullable=False),
)

# create country SQL table
table_name2 = "country_stats"
country_stats_table = Table(
    table_name2,
    metadata_obj,
    Column("country_name", String(16), primary_key=True),
    Column("population", Integer),
    Column("continent", String(16), nullable=False),
)

metadata_obj.create_all(engine)

Then, when you create the SQLDatabase object, you need to include both table names in the include_tables parameter:

sql_database = SQLDatabase(engine, include_tables=["city_stats", "country_stats"])

And when you create the NLSQLTableQueryEngine object, you also need to include both table names in the tables parameter:

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["city_stats", "country_stats"],
)

Finally, when you create the ObjectIndex, you need to create a SQLTableSchema object for each table:

table_schema_objs = [SQLTableSchema(table_name="city_stats"), SQLTableSchema(table_name="country_stats")]

And pass them to the ObjectIndex.from_objects method:

obj_index = ObjectIndex.from_objects(
    table_schema_objs,
    table_node_mapping,
    VectorStoreIndex,
)

In the context shared, you can also add additional context information for each table schema you define:

# manually set context text
city_stats_text = (
    "This table gives information regarding the population and country of a given city.\n"
    "The user will query with codewords, where 'foo' corresponds to population and 'bar'"
    "corresponds to city."
)

table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
    (SQLTableSchema(table_name="city_stats", context_str=city_stats_text))
]

I hope this helps! If you have any other questions, feel free to ask.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

AbhiPawar5 commented 9 months ago

πŸ€–

Hello,

Thank you for bringing this to our attention. You're correct that the guide currently only shows how to add a single table to the SQL database. However, you can add multiple tables by creating a new Table object for each table you want to add, and then adding them to the MetaData object using metadata_obj.create_all(engine). Here's an example of how you can add two tables, city_stats and country_stats:

# create city SQL table
table_name1 = "city_stats"
city_stats_table = Table(
    table_name1,
    metadata_obj,
    Column("city_name", String(16), primary_key=True),
    Column("population", Integer),
    Column("country", String(16), nullable=False),
)

# create country SQL table
table_name2 = "country_stats"
country_stats_table = Table(
    table_name2,
    metadata_obj,
    Column("country_name", String(16), primary_key=True),
    Column("population", Integer),
    Column("continent", String(16), nullable=False),
)

metadata_obj.create_all(engine)

Then, when you create the SQLDatabase object, you need to include both table names in the include_tables parameter:

sql_database = SQLDatabase(engine, include_tables=["city_stats", "country_stats"])

And when you create the NLSQLTableQueryEngine object, you also need to include both table names in the tables parameter:

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["city_stats", "country_stats"],
)

Finally, when you create the ObjectIndex, you need to create a SQLTableSchema object for each table:

table_schema_objs = [SQLTableSchema(table_name="city_stats"), SQLTableSchema(table_name="country_stats")]

And pass them to the ObjectIndex.from_objects method:

obj_index = ObjectIndex.from_objects(
    table_schema_objs,
    table_node_mapping,
    VectorStoreIndex,
)

In the context shared, you can also add additional context information for each table schema you define:

# manually set context text
city_stats_text = (
    "This table gives information regarding the population and country of a given city.\n"
    "The user will query with codewords, where 'foo' corresponds to population and 'bar'"
    "corresponds to city."
)

table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
    (SQLTableSchema(table_name="city_stats", context_str=city_stats_text))
]

I hope this helps! If you have any other questions, feel free to ask.

Best, Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

How can I index and add hundreds or thousands of tables? Is there any to simplify the above process?

dosubot[bot] commented 9 months ago

πŸ€–

Hello,

Yes, your code for adding multiple tables to the SQL database in LlamaIndex is mostly correct. However, there are a few adjustments needed.

Firstly, you need to define table_node_mapping before using it in ObjectIndex.from_objects(). It should be defined after creating your tables and before creating your ObjectIndex.

Secondly, you need to add country_stats to your table_schema_objs list.

Here's the corrected code:

# create city SQL table
table_name1 = "city_stats"
city_stats_table = Table(
    table_name1,
    metadata_obj,
    Column("city_name", String(16), primary_key=True),
    Column("population", Integer),
    Column("country", String(16), nullable=False),
)

# create country SQL table
table_name2 = "country_stats"
country_stats_table = Table(
    table_name2,
    metadata_obj,
    Column("country_name", String(16), primary_key=True),
    Column("population", Integer),
    Column("continent", String(16), nullable=False),
)

metadata_obj.create_all(engine)

sql_database = SQLDatabase(engine, include_tables=["city_stats", "country_stats"])

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["city_stats", "country_stats"],
)

table_node_mapping = SQLTableNodeMapping(sql_database)

table_schema_objs = [
    SQLTableSchema(table_name="city_stats"),
    SQLTableSchema(table_name="country_stats")
]

obj_index = ObjectIndex.from_objects(
    table_schema_objs,
    table_node_mapping,
    VectorStoreIndex,
)

# manually set context text
city_stats_text = (
    "This table gives information regarding the population and country of a given city.\n"
    "The user will query with codewords, where 'foo' corresponds to population and 'bar'"
    "corresponds to city."
)

country_stats_text = (
    "This table gives information regarding the population and continent of a given country.\n"
    "The user will query with codewords, where 'foo' corresponds to population and 'bar'"
    "corresponds to country."
)

table_schema_objs = [
    SQLTableSchema(table_name="city_stats", context_str=city_stats_text),
    SQLTableSchema(table_name="country_stats", context_str=country_stats_text)
]

Regarding your question about metadata_obj.create_all(engine), this line is used to create all tables stored in the metadata. In SQLAlchemy, which LlamaIndex uses for SQL operations, MetaData.create_all() is a method that issues CREATE statements for all Table objects that don’t yet exist in the database. This method can be called multiple times and if a particular table already exists in the database then it will be skipped.

I hope this helps! Let me know if you have any other questions.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

sbisen commented 6 months ago

@dosu-bot Is there a way to add multiple table from different database and schema in SQLDatabase? Presently these tables are from 1 database: sql_database = SQLDatabase(engine, include_tables=["city_stats", "country_stats"]) if we have 2 different database, how to include tables in that case?

dosubot[bot] commented 6 months ago

πŸ€–

To handle hundreds or thousands of tables, you can automate the process by creating a function that generates the necessary objects for each table. Here's an example of how you can do this:

def create_table_objects(table_names, metadata_obj, engine):
    table_objects = []
    for table_name in table_names:
        table = Table(
            table_name,
            metadata_obj,
            Column("name", String(16), primary_key=True),
            Column("population", Integer),
            Column("country", String(16), nullable=False),
        )
        table_objects.append(table)
    metadata_obj.create_all(engine)
    return table_objects

You can then use this function to create the Table objects for all your tables:

table_names = ["table1", "table2", "table3", ..., "tableN"]  # replace with your table names
table_objects = create_table_objects(table_names, metadata_obj, engine)

And include all the table names when you create the SQLDatabase and NLSQLTableQueryEngine objects:

sql_database = SQLDatabase(engine, include_tables=table_names)
query_engine = NLSQLTableQueryEngine(sql_database=sql_database, tables=table_names)

Finally, you can create the ObjectIndex by creating a SQLTableSchema object for each table:

table_schema_objs = [SQLTableSchema(table_name=table_name) for table_name in table_names]
obj_index = ObjectIndex.from_objects(table_schema_objs, table_node_mapping, VectorStoreIndex)

This should simplify the process of adding multiple tables to the LlamaIndex. However, please note that this is a simplified example and you may need to adjust the code to fit your specific needs, such as the columns for each table.

For more information, you can refer to the following sources:

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.