googleapis / python-bigquery-sqlalchemy

SQLAlchemy dialect for BigQuery
MIT License
436 stars 130 forks source link

Parameter conflict caused by case-sensitivity mismatch #1101

Open aholyoke opened 3 months ago

aholyoke commented 3 months ago

It seems like SA handles parameter conflicts by appending an increasing integer after each duplicate parameter name, however it detects duplicate parameters in a case-sensitive fashion. So two parameters whose name only differs in capitalization will be considered not duplicates. However the BQ API seems to be using a case-insensitive scheme for detecting duplicates.

Environment details

sqlalchemy-bigquery==1.11.0
SQLAlchemy==2.0.31

Steps to reproduce

The following code:

import sqlalchemy as sa

engine = sa.create_engine("bigquery://")
conn = engine.connect()

meta = sa.MetaData()
customer_table = sa.Table(
    "customers",
    meta,
    sa.Column("user_id", sa.Integer()),
    sa.Column("STATE", sa.String()),
    schema="holyoke_test",
)

journey_table = sa.Table(
    "journey",
    meta,
    sa.Column("user_id", sa.Integer()),
    sa.Column("state", sa.String()),
    schema="holyoke_test",
)
meta.create_all(conn)

expr = sa.select(
    1,
).select_from(
    customer_table
).join(
    journey_table,
    customer_table.columns.user_id == journey_table.columns.user_id 
).where(
    customer_table.c.STATE == "California",
    journey_table.c.state == "pending",
)

conn.execute(expr).fetchall()

Will raise the following DatabaseError

DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/flywheel-dev-328120/queries?prettyPrint=false: Duplicate parameter name state_1
[SQL: SELECT 1
FROM `holyoke_test`.`customers` JOIN `holyoke_test`.`journey` ON `customers`.`user_id` = `journey`.`user_id`
WHERE `customers`.`STATE` = %(STATE_1:STRING)s AND `journey`.`state` = %(state_1:STRING)s]
[parameters: {'STATE_1': 'California', 'state_1': 'pending'}]

SA BindParameter names are usually derived from the column you're matching on. In this case, because I have the filter expression customer_table.c.STATE == "California", it will place the string "California" in a BindParameter called STATE_1, then I have another filter journey_table.c.state == "pending" which creates another BindParameter called state_1. This seems logically consistent from SA's point of view because it considers parameters with different capitalization to be different parameters. And as you can see from the error, it is rendering them in the correct location in the SQL. However, BQ API is disallowing this.

jzhizhenka commented 3 weeks ago

@alvarowolfx any updates on fixing this?

chalmerlowe commented 3 weeks ago

@jzhizhenka I am the maintainer of this repo. I am working some other priority issues at this moment. Will get to this as soon as possible.