kayak / pypika

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
http://pypika.readthedocs.io/en/latest/
Apache License 2.0
2.43k stars 293 forks source link

psycopg2/postgresql throws ambiguous column error because groupby clause doesn't generate with namespaces set correctly #728

Open jpjagt opened 1 year ago

jpjagt commented 1 year ago

hey there! thank you for this amazing library. i love using it!

i experienced an error which i believe is related to some choices when constructing the query. i have only experienced this in my case, which is PostgreSQL (with sqlalchemy/psycopg2). in this toy example, i have two tables: A and B. they both have a column called foo.

if i run the following code:

import pypika as ppk

a_tbl = ppk.Table("A")
b_tbl = ppk.Table("B")

a_foo_col = a_tbl.foo.as_("foo")

query_err = ppk.Query.from_(a_tbl).select(a_foo_col, b_tbl.bar.as_("bar")).join(b_tbl).on(
    a_tbl.id == b_tbl.a_id
).groupby(a_foo_col)

str(query_err)

this is the resulting query:

'SELECT "A"."foo" "foo","B"."bar" "bar" FROM "A" JOIN "B" ON "A"."id"="B"."a_id" GROUP BY "foo"'

however, running this leads to the following error: sqlalchemy.exc.ProgrammingError: (psycopg2.errors.AmbiguousColumn) column reference "foo" is ambiguous

if i construct the query in the following way, it runs without a problem (because the group by column is properly namespaced):

query_noproblem = ppk.Query.from_(a_tbl).select(a_foo_col, b_tbl.bar.as_("bar")).join(b_tbl).on(
    a_tbl.id == b_tbl.a_id
).groupby(
   a_foo_col.get_sql() # this causes the GROUP BY to be namespaced for some reason
)

str(query_noproblem)
# => 'SELECT "A"."foo" "foo","B"."bar" "bar" FROM "A" JOIN "B" ON "A"."id"="B"."a_id" GROUP BY "A"."foo"'

as for group by'ing on a column from B (not the main table selection), the behaviour is slightly different but it still doesn't seem entirely correct:

so it seems like at least in the group by clause (and maybe others), at least in postgresql, the column always needs to be namespaced properly to avoid possible ambiguity errors.

i hope this example is reproducible, i changed the column & table names to simplify the query, but i'm not able to test it on a real psql database right now.