tobymao / sqlglot

Python SQL Parser and Transpiler
https://sqlglot.com/
MIT License
6.48k stars 665 forks source link

wrong qualify_columns on generated columns #3687

Closed viplazylmht closed 3 months ago

viplazylmht commented 3 months ago

Before you file an issue

Fully reproducible code snippet

Prepare data:

from sqlglot import parse_one, exp
from sqlglot.optimizer.qualify_columns import qualify_columns

sql ="""with data as (
  select 1 as id, 2 as my_id, 'a' as name, 'de' as full_name
)
select
  id as my_id,
  concat(id, name) as full_name
from data
group by my_id, full_name"""

schema = {
"catalog": {
      "schema1": {
          "customer": {
              "id": "INTEGER",
          },
      }
  }
}

Transpiler from bigquery to trino:

smt = parse_one(sql, read="bigquery")
print(qualify_columns(smt, schema=schema).sql(dialect="trino", pretty=True))

Output:

WITH data AS (
  SELECT
    1 AS id,
    2 AS my_id,
    'a' AS name,
    'de' AS full_name
)
SELECT
  data.id AS my_id,
  CONCAT(CAST(data.id AS VARCHAR), CAST(data.name AS VARCHAR)) AS full_name
FROM data
GROUP BY
  data.my_id,
  data.full_name

The problem of the output is it can not be run on trino, because data.id and data.name is not an aggregate expression or appear in GROUP BY clause.

Additional test: The output will expand correctly if we no not provide schema:

smt = parse_one(sql, read="bigquery")
print(qualify_columns(smt, schema=None).sql(dialect="trino", pretty=True))

"""
WITH data AS (
  SELECT
    1 AS id,
    2 AS my_id,
    'a' AS name,
    'de' AS full_name
)
SELECT
  data.id AS my_id,
  CONCAT(CAST(data.id AS VARCHAR), CAST(data.name AS VARCHAR)) AS full_name
FROM data
GROUP BY
  data.id,
  CONCAT(CAST(data.id AS VARCHAR), CAST(data.name AS VARCHAR))
"""

The output will break again if I disable expand_alias_ref:

smt = parse_one(sql, read="bigquery")
print(qualify_columns(smt, schema=None, expand_alias_refs=False).sql(dialect="trino", pretty=True))

Solution

I have made a patch to fix my case, but I think we can have better solution to fix this.

https://github.com/viplazylmht/sqlglot/blob/644bf67720442f964ffc4606693dcfc1b87ec6e6/sqlglot/optimizer/qualify_columns.py#L423-L431

Official Documentation Please include links to official SQL documentation related to your issue.

VaggelisD commented 3 months ago

Thanks for reporting this issue, I'll investigate this on Monday unless someone beats me to it.