Closed alex4506 closed 1 year ago
Hi @flex-enjoy, I've tried to create a Minimal Reproducible Example to try to reproduce the error that you got:
from pg8000.native import Connection
with Connection("postgres") as con:
con.run(
"create temporary table a ("
"collection_id integer, holder_address varchar(40), "
"create_at timestamp, id integer)"
)
con.run(
"""SELECT
a.collection_id AS collection_id,
a.holder_address AS holder_address,
date_trunc('hour', a.create_at) AS hour,
count(a.id) AS sales
FROM
a
WHERE
a.collection_id = 100
GROUP BY
a.collection_id,
a.holder_address,
date_trunc('hour', a.create_at)"""
)
So, running the above doesn't give an error. Can you provide a Minimal Reproducible Example?
@tlocke
Sorry for leaving out some details, I do not call pg8000 directly but from sqlalchemy
Here is the demo script with table a
in db test
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
from sqlalchemy import Integer, DateTime, String
from sqlalchemy import Column
from sqlalchemy import func
from sqlalchemy.orm import declarative_base
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
collection_id = Column(Integer)
holder_address = Column(String(40))
create_at = Column(DateTime)
engine = create_engine("postgresql+pg8000://postgres@localhost/test", echo=True, future=True)
session = Session(engine)
q = session.query(A.collection_id, A.holder_address,
func.date_trunc('hour', A.create_at).label('hour')) \
.group_by(A.collection_id, A.holder_address, func.date_trunc('hour', A.create_at)).all()
This will throw error
sqlalchemy.exc.ProgrammingError: (pg8000.dbapi.ProgrammingError) {'S': 'ERROR', 'V': 'ERROR', 'C': '42803', 'M': 'column "a.create_at" must appear in the GROUP BY clause or be used in an aggregate function', 'P': '97', 'F': 'parse_agg.c', 'L': '1413', 'R': 'check_ungrouped_columns_walker'}
[SQL: SELECT a.collection_id AS a_collection_id, a.holder_address AS a_holder_address, date_trunc(%s, a.create_at) AS hour
FROM a GROUP BY a.collection_id, a.holder_address, date_trunc(%s, a.create_at)]
[parameters: ('hour', 'hour')]
(Background on this error at: https://sqlalche.me/e/14/f405)
So I think the next step is to determine if the problem lies with SQLAlchemy or pg8000. You can see the SQL that SQLAlchemy generates (I think by just using the str function). So then you should be able to create a Minimal Reproducible Example using just pg8000 (or using just SQLAlchemy if the problem lies there). It's easier said than done I know, so good luck in getting to the bottom of it!
Seems using just pg8000 is ok
from pg8000.native import Connection
with Connection(user="postgres", database="test") as con:
con.run(
"""SELECT a.collection_id AS a_collection_id, a.holder_address AS a_holder_address, date_trunc('hour', a.create_at) AS hour FROM a GROUP BY a.collection_id, a.holder_address, date_trunc('hour', a.create_at)"""
And using sqlalchemy + psycopg2
has no error too, so I think maybe there is something wrong with sqlalchemy and pg8000 in this case
I think I've worked out what's going on here. Breaking it down to its simplest, the following fails:
from sqlalchemy import Column, DateTime, Integer, create_engine, func, select
from sqlalchemy.orm import Session, declarative_base
Base = declarative_base()
class A(Base):
__tablename__ = "a"
__table_args__ = {"prefixes": ["TEMPORARY"]}
id = Column(Integer, primary_key=True)
create_at = Column(DateTime)
engine = create_engine("postgresql+pg8000://postgres@localhost/test")
Base.metadata.create_all(engine)
session = Session(engine)
q = select(func.date_trunc("hour", A.create_at)).group_by(
func.date_trunc("hour", A.create_at)
)
session.execute(q)
But instead doing the following succeeds:
from sqlalchemy import Column, DateTime, Integer, create_engine, func, select
from sqlalchemy.orm import Session, declarative_base
Base = declarative_base()
class A(Base):
__tablename__ = "a"
__table_args__ = {"prefixes": ["TEMPORARY"]}
id = Column(Integer, primary_key=True)
create_at = Column(DateTime)
engine = create_engine("postgresql+pg8000://postgres@localhost/test")
Base.metadata.create_all(engine)
session = Session(engine)
q = select(func.date_trunc("hour", A.create_at).label("hour_col")).group_by("hour_col")
session.execute(q)
So the key is that in the GROUP BY, the column should be referred to by name. The reason it works in psycopg2 is because it substitutes in the variables before sending to the server, whereas pg8000 sends the SQL with placeholders separately from the values. Since in this case the placeholders have different names ($1 and $2 behind the scenes), the server thinks that the column in the GROUP BY wasn't defined.
Awesome, so will it be fixed or recorded in the document?
I can't think of a change that could make to pg8000 to improve things, but let me know if you can think of anything. As for a change to the docs, I'm not sure what we could put in the pg8000 docs to help people, but again, let me know if you have a suggestion.
Such query can be executed successfully in CLI or psycopg2
But with pg8000 gave such error:
'column "holder_activity.create_at" must appear in the GROUP BY clause or be used in an aggregate function'