I have this example with two nested subqueries, where the second references the first, which works if my output dialect is presto, but once the output is spark/spark2, the subqueries are flattened and the resulting order is invalid (subqueries reference subqueries that are below them)
Fully reproducible code snippet
Base configuration
import sqlglot
query = (
sqlglot.select("*")
.from_("subquery2")
.with_("subquery1", as_="with tmp as (select * from table0) select * from tmp")
.with_("subquery2", as_="with tmp2 as (select * from table1 where a in subquery1) select * from tmp2")
)
query.sql(dialect="presto") yields a valid query (subquery1 referenced inside tmp2 is valid)
WITH subquery1 AS (
WITH tmp AS (
SELECT
*
FROM table0
)
SELECT
*
FROM tmp
), subquery2 AS (
WITH tmp2 AS (
SELECT
*
FROM table1
WHERE
a IN subquery1
)
SELECT
*
FROM tmp2
)
SELECT
*
FROM subquery2
query.sql(dialect="spark") yields an invalid query (subquery1 referenced inside tmp2 is invalid as subquery1 is not yet defined):
WITH tmp2 AS (
SELECT
*
FROM table1
WHERE
a IN subquery1
), tmp AS (
SELECT
*
FROM table0
), subquery1 AS (
SELECT
*
FROM tmp
), subquery2 AS (
SELECT
*
FROM tmp2
)
SELECT
*
FROM subquery2
I have this example with two nested subqueries, where the second references the first, which works if my output dialect is
presto
, but once the output isspark
/spark2
, the subqueries are flattened and the resulting order is invalid (subqueries reference subqueries that are below them)Fully reproducible code snippet Base configuration
query.sql(dialect="presto")
yields a valid query (subquery1
referenced insidetmp2
is valid)query.sql(dialect="spark")
yields an invalid query (subquery1
referenced insidetmp2
is invalid assubquery1
is not yet defined):