Closed nipilii closed 1 week ago
Hey @nipilii,
Thanks for your kind words! I'd suggest that you:
repr
of the top-level node to see what's going on under the hood:>>> import sqlglot
>>> ast = sqlglot.parse_one("SELECT * FROM table1 t1, table2 t2")
>>> print(repr(ast))
Select(
expressions=[
Star()],
from=From(
this=Table(
this=Identifier(this=table1, quoted=False),
alias=TableAlias(
this=Identifier(this=t1, quoted=False)))),
joins=[
Join(
this=Table(
this=Identifier(this=table2, quoted=False),
alias=TableAlias(
this=Identifier(this=t2, quoted=False))))])
As you can see, the comma separated tables in this case are parsed as JOINs so they'd be stored in the joins
arg of the exp.Select
, which is on the same level as from
on the parsed tree.
I'll go ahead and close this, for any similar/follow up questions you can also join our Slack channel.
Thank you for creating and maintaining sqlglot. It's been incredibly helpful for SQL parsing tasks. I encountered an issue when trying to extract all tables and subqueries from the top-level FROM clause of a query and was hoping you could provide some guidance.
Example Query SELECT t1.col1, t2.col2 FROM table1 t1, table2 t2, (SELECT col1, col2 FROM table2 WHERE col3 = 'A') t3 WHERE t1.col1 = t2.col1;
Desired Output For the above query, I would like to extract all tables and subqueries from the top-level FROM clause:
table1 t1 table2 t2 Subquery t3 with its SQL.
Current Behavior Here’s what I am doing:
parsed = parse_one(query) from_clause = parsed.args.get("from") print("Extracted FROM Clause:", from_clause)
def explore_from_clause(from_clause): if not from_clause: return if isinstance(from_clause, exp.From): items = from_clause.args.get("expressions", []) or [from_clause.args.get("this")] for item in items: if isinstance(item, exp.Table): # Table handling print(f"Table: {item.this}, Alias: {item.alias}") elif isinstance(item, exp.Subquery): # Subquery handling print(f"Subquery Alias: {item.alias}") print(f"Subquery SQL: {item.this.sql()}") else: print(f"Unhandled Expression: {item}") else: print("No valid FROM clause found.")
explore_from_clause(from_clause)
Result: The first table, table1 t1, gets extracted correctly. The subquery t3 can be manually extracted. However, table2 t2 is omitted entirely from the from_clause.
Additional Example Without a Subquery For a simpler query like: SELECT t1.col1, t2.col2, t3.col3 FROM table1 t1, table2 t2, table3 t3;
I only get the first table (table1 t1) when analyzing the from_clause using sqlglot. The other tables (table2 t2 and table3 t3) are omitted.
My Question
Any help or guidance would be greatly appreciated!