tobymao / sqlglot

Python SQL Parser and Transpiler
https://sqlglot.com/
MIT License
6.76k stars 709 forks source link

nondeterministic column order in optimized plans #3709

Closed cpcloud closed 4 months ago

cpcloud commented 4 months ago

sqlglot optimizer reports aggregation columns in an order that doesn't always match the input query:

import sqlglot as sg
import sqlglot.expressions as sge
import sqlglot.optimizer as sgo
import sqlglot.planner as sgp

raw_sql = """
SELECT
  AVG(bill_length_mm) AS avg_bill_length,
  AVG(bill_depth_mm) AS avg_bill_depth
FROM penguins
"""

catalog = {
    "penguins": {
        "bill_length_mm": sge.DataType(kind=sge.DataType.Type.DOUBLE),
        "bill_depth_mm": sge.DataType(kind=sge.DataType.Type.DOUBLE),
    },
}
expr = sg.parse_one(raw_sql, read="duckdb")
tree = sgo.optimize(expr, catalog)
plan = sgp.Plan(tree)
assert [agg.alias for agg in plan.root.aggregations] == [
    "avg_bill_length",
    "avg_bill_depth",
]

The assert only sometimes fails, so to reproduce you'll probably have to run the example more than once.

cpcloud commented 4 months ago

Wow, bugs seem to just fix themselves around here, for fear that @tobymao might look in their general direction!

cpcloud commented 4 months ago

Also, thank you for fixing this so quickly 😂