parkervg / blendsql

Query language for blending SQL logic and LLM reasoning across structured + unstructured data. [Findings of ACL 2024]
https://parkervg.github.io/blendsql/
Apache License 2.0
72 stars 4 forks source link

Integrating `qualify_columns` from sqlglot #6

Closed parkervg closed 8 months ago

parkervg commented 8 months ago

This PR uses the qualify_columns function from sqlglot to disambiguate column names referenced in a multi-table environment.

TL;DR:

This is useful, since from the below query (taken from the test_join_multi_exec test):

SELECT "Run Date", Account, Action, ROUND("Amount ($)", 2) AS 'Total Dividend Payout ($$)', Name
FROM account_history
LEFT JOIN constituents ON account_history.Symbol = constituents.Symbol
WHERE Sector = 'Information Technology'
AND {{starts_with('A', 'constituents::Name')}} = 1
AND lower(Action) like "%dividend%"

We want to recover the following abstracted queries for the 2 tables constituents and account_history in table_star_queries():

SELECT * FROM "account_history" WHERE TRUE AND LOWER(account_history.Action) LIKE "%dividend%" AND TRUE AND TRUE

SELECT * FROM "constituents" WHERE TRUE AND TRUE AND constituents.Sector = 'Information Technology' AND TRUE` and setting to `4e8a_constituents_0

In order to do this, though, we need to know that the Sector column comes from the constituents table, and Action comes from account_history.

Prior to parsing in BlendSQL, we now do that disambiguation so users don't always need to qualify the column references themselves in the {tablename}.{columnname} sort of style. If it works in native SQLite, it'll work in BlendSQL.