googleapis / python-bigquery-sqlalchemy

SQLAlchemy dialect for BigQuery
MIT License
426 stars 127 forks source link

fix: Use except distinct and intersect distinct #1094

Closed aholyoke closed 1 month ago

aholyoke commented 2 months ago

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Fixes # 🦕

According to BQ docs, the syntax for except and intersect is EXCEPT DISTINCT and INTERSECT DISTINCT respectively.

For example,

import sqlalchemy as sa

engine = sa.create_engine("bigquery://")
conn = engine.connect()

meta = sa.MetaData()
table = sa.Table(
    "set_example",
    meta,
    sa.Column("id", sa.Integer()),
    sa.Column("value", sa.Integer()),
    schema=DATASET_NAME,
)
table.create(bind=conn)

conn.execute(table.insert().values([
    {"id": 1, "value": 5},
    {"id": 2, "value": 2},
    {"id": 3, "value": 4},
    {"id": 4, "value": 9},
    {"id": 5, "value": 7},
    {"id": 6, "value": 1},
]))

expr = sa.select(table.c.id).except_(
    sa.select(table.c.id).where(table.c.value >= 5),
)
conn.execute(expr).fetchall()

Before this change we get:

DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 EXCEPT must be followed by ALL, DISTINCT, or "(" at [2:35]

[SQL: SELECT `set_example`.`id`
FROM `holyoke_test`.`set_example` EXCEPT SELECT `set_example`.`id`
FROM `holyoke_test`.`set_example`
WHERE `set_example`.`value` >= %(value_1:INT64)s]
[parameters: {'value_1': 5}]
(Background on this error at: https://sqlalche.me/e/14/4xp6)

After this change we get: [(2,), (3,), (6,)]

Similarly, with

expr = sa.select(table.c.id).intersect(
    sa.select(table.c.id).where(table.c.value >= 5),
)
conn.execute(expr).fetchall()

Before this change we get:

DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 Syntax error: Expected keyword ALL or keyword DISTINCT but got keyword SELECT at [2:45]

[SQL: SELECT `set_example`.`id`
FROM `holyoke_test`.`set_example` INTERSECT SELECT `set_example`.`id`
FROM `holyoke_test`.`set_example`
WHERE `set_example`.`value` >= %(value_1:INT64)s]
[parameters: {'value_1': 5}]
(Background on this error at: https://sqlalche.me/e/14/4xp6)

After this change we get: [(5,), (1,), (4,)]

It seems that this was discussed here with the maintainers of SA: https://github.com/sqlalchemy/sqlalchemy/issues/11545, and the recommendation is for the dialect to implement the correct compound keywords for except_ and intersect