tlocke / pg8000

A Pure-Python PostgreSQL Driver
BSD 3-Clause "New" or "Revised" License
515 stars 46 forks source link

group by date_trunc cause error #112

Closed alex4506 closed 1 year ago

alex4506 commented 2 years ago

Such query can be executed successfully in CLI or psycopg2

  SELECT
      a.collection_id AS collection_id,
      a.holder_address AS holder_address,
      date_trunc('hour', a.create_at) AS hour,
      count(a.id) AS sales
  FROM
      a
  WHERE
      a.collection_id  = 100
  GROUP BY
      a.collection_id,
      a.holder_address,
      date_trunc('hour', a.create_at)

But with pg8000 gave such error:

'column "holder_activity.create_at" must appear in the GROUP BY clause or be used in an aggregate function'

tlocke commented 2 years ago

Hi @flex-enjoy, I've tried to create a Minimal Reproducible Example to try to reproduce the error that you got:

from pg8000.native import Connection

with Connection("postgres") as con:
    con.run(
        "create temporary table a ("
        "collection_id integer, holder_address varchar(40), "
        "create_at timestamp, id integer)"
    )
    con.run(
        """SELECT
      a.collection_id AS collection_id,
      a.holder_address AS holder_address,
      date_trunc('hour', a.create_at) AS hour,
      count(a.id) AS sales
  FROM
      a
  WHERE
      a.collection_id  = 100
  GROUP BY
      a.collection_id,
      a.holder_address,
      date_trunc('hour', a.create_at)"""
    )

So, running the above doesn't give an error. Can you provide a Minimal Reproducible Example?

alex4506 commented 2 years ago

@tlocke

Sorry for leaving out some details, I do not call pg8000 directly but from sqlalchemy

Here is the demo script with table a in db test

from sqlalchemy import create_engine
from sqlalchemy.orm import Session
from sqlalchemy import Integer, DateTime, String
from sqlalchemy import Column
from sqlalchemy import func
from sqlalchemy.orm import declarative_base

Base = declarative_base()

class A(Base):
    __tablename__ = 'a'
    id = Column(Integer, primary_key=True)
    collection_id = Column(Integer)
    holder_address = Column(String(40))
    create_at = Column(DateTime)

engine = create_engine("postgresql+pg8000://postgres@localhost/test", echo=True, future=True)
session = Session(engine)

q = session.query(A.collection_id, A.holder_address,
            func.date_trunc('hour', A.create_at).label('hour')) \
            .group_by(A.collection_id, A.holder_address, func.date_trunc('hour', A.create_at)).all()

This will throw error

sqlalchemy.exc.ProgrammingError: (pg8000.dbapi.ProgrammingError) {'S': 'ERROR', 'V': 'ERROR', 'C': '42803', 'M': 'column "a.create_at" must appear in the GROUP BY clause or be used in an aggregate function', 'P': '97', 'F': 'parse_agg.c', 'L': '1413', 'R': 'check_ungrouped_columns_walker'}
[SQL: SELECT a.collection_id AS a_collection_id, a.holder_address AS a_holder_address, date_trunc(%s, a.create_at) AS hour 
FROM a GROUP BY a.collection_id, a.holder_address, date_trunc(%s, a.create_at)]
[parameters: ('hour', 'hour')]
(Background on this error at: https://sqlalche.me/e/14/f405)
tlocke commented 2 years ago

So I think the next step is to determine if the problem lies with SQLAlchemy or pg8000. You can see the SQL that SQLAlchemy generates (I think by just using the str function). So then you should be able to create a Minimal Reproducible Example using just pg8000 (or using just SQLAlchemy if the problem lies there). It's easier said than done I know, so good luck in getting to the bottom of it!

alex4506 commented 2 years ago

Seems using just pg8000 is ok

from pg8000.native import Connection

with Connection(user="postgres", database="test") as con:
     con.run(
     """SELECT a.collection_id AS a_collection_id, a.holder_address AS a_holder_address, date_trunc('hour', a.create_at) AS hour FROM a GROUP BY a.collection_id, a.holder_address, date_trunc('hour', a.create_at)"""

And using sqlalchemy + psycopg2 has no error too, so I think maybe there is something wrong with sqlalchemy and pg8000 in this case

tlocke commented 2 years ago

I think I've worked out what's going on here. Breaking it down to its simplest, the following fails:

from sqlalchemy import Column, DateTime, Integer, create_engine, func, select
from sqlalchemy.orm import Session, declarative_base

Base = declarative_base()

class A(Base):
    __tablename__ = "a"
    __table_args__ = {"prefixes": ["TEMPORARY"]}
    id = Column(Integer, primary_key=True)
    create_at = Column(DateTime)

engine = create_engine("postgresql+pg8000://postgres@localhost/test")
Base.metadata.create_all(engine)
session = Session(engine)

q = select(func.date_trunc("hour", A.create_at)).group_by(
    func.date_trunc("hour", A.create_at)
)

session.execute(q)

But instead doing the following succeeds:

from sqlalchemy import Column, DateTime, Integer, create_engine, func, select
from sqlalchemy.orm import Session, declarative_base

Base = declarative_base()

class A(Base):
    __tablename__ = "a"
    __table_args__ = {"prefixes": ["TEMPORARY"]}
    id = Column(Integer, primary_key=True)
    create_at = Column(DateTime)

engine = create_engine("postgresql+pg8000://postgres@localhost/test")
Base.metadata.create_all(engine)
session = Session(engine)

q = select(func.date_trunc("hour", A.create_at).label("hour_col")).group_by("hour_col")

session.execute(q)

So the key is that in the GROUP BY, the column should be referred to by name. The reason it works in psycopg2 is because it substitutes in the variables before sending to the server, whereas pg8000 sends the SQL with placeholders separately from the values. Since in this case the placeholders have different names ($1 and $2 behind the scenes), the server thinks that the column in the GROUP BY wasn't defined.

alex4506 commented 2 years ago

Awesome, so will it be fixed or recorded in the document?

tlocke commented 2 years ago

I can't think of a change that could make to pg8000 to improve things, but let me know if you can think of anything. As for a change to the docs, I'm not sure what we could put in the pg8000 docs to help people, but again, let me know if you have a suggestion.