kvesteri / sqlalchemy-utils

Various utility functions and datatypes for SQLAlchemy.
Other
1.27k stars 322 forks source link

Wrong aggregate() on polymorhic join inherited table #226

Open ashl1 opened 8 years ago

ashl1 commented 8 years ago

I make a test showed the problem at https://github.com/ashl1/sqlalchemy-utils/tree/fix-aggregate-join-inherited

I researched deep into this and found:

1) This example produce wrong SQL with vector multiplied table records

UPDATE car_catalog SET net_worth=(SELECT sum(product.price) AS sum_1 
FROM product, car_catalog 
WHERE car_catalog.id = product.catalog_id) WHERE car_catalog.id IN (:id_1, :id_2)

instead of

UPDATE car_catalog SET net_worth=(SELECT sum(product.price) AS sum_1 
FROM product 
WHERE car_catalog.id = product.catalog_id) WHERE car_catalog.id IN (:id_1, :id_2)

So, this produce incorrect sum.

2) The difference from existed test appears in '_correlate': {<sqlalchemy.sql.selectable.Join at 0x7f9eba7895f8; Join object on catalog(140319677835136) and car_catalog(140319710024760)>} of query.

In existed tests uses correlation directly for {Table('catalog', MetaData(bind=None), Column('id', Integer(), table=<catalog>, primary_key=True, nullable=False), Column('name', Unicode(length=255), table=<catalog>), Column('type', Unicode(length=255), table=<catalog>), Column('net_worth', Numeric(), table=<catalog>, default=ColumnDefault(0)), schema=None)}

3) I found similar bug ticket for sqlalchemy at https://bitbucket.org/zzzeek/sqlalchemy/issues/3662/correlation-fails-with-with_polymorphic

Maintainer @zzzeek said in commit

Using correlated subqueries against polymorphic mappings still has some unpolished edges. If for example Person is polymorphically linked to a so-called "concrete polymorphic union" query, the above subquery may not correctly refer to this subquery. In all cases, a way to refer to the "polyorphic" entity fully is to create an :func:.aliased object from it first::

    # works with all SQLAlchemy versions and all types of polymorphic
    # aliasing.

    paliased = aliased(Person)
    sess.query(paliased.name)
                .filter(
                    sess.query(Company.name).
                    filter(Company.company_id == paliased.company_id).
                    correlate(paliased).as_scalar() == "Elbonia, Inc.")

The :func:.aliased construct guarantees that the "polymorphic selectable" is wrapped in a subquery. By referring to it explicitly in the correlated subquery, the polymorphic form is correctly used.

By the all I'm not sure is this upstream incorrect behavior and where this should be fixed.

ashl1 commented 8 years ago

SA-Utils uses select().correlate() which is located in different module in SA: sql/selectable.py:3177 with old behavior. I tried to reproduce the problem in flat SA to report a bug, but it seems to produce different SQL. Could you check the code to reproduce the problem in SA?

from sqlalchemy import Column, Integer, ForeignKey, String, DateTime
from sqlalchemy import alias, and_, func
from sqlalchemy import create_engine, inspect, select
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import join, relationship, sessionmaker

engine = create_engine('sqlite:///:memory:', echo=True)
Session = sessionmaker(bind=engine)
Base = declarative_base()

class Parent(Base):
    __tablename__ = 'parent_objs'
    id = Column(Integer, primary_key=True)
    obj_type = Column(String, nullable=False)
    name = Column(String)
    __mapper_args__ = {
        'polymorphic_on': obj_type,
        'with_polymorphic': '*'
    }

class Child(Parent):
    __tablename__ = 'child_objs'
    id = Column(Integer, ForeignKey('parent_objs.id'), primary_key=True)
    columnA = Column(String)
    _sum = Column(Integer)
    related = relationship('Related', backref='child')
    __mapper_args__ = {
        'polymorphic_identity': 'child',
    }

class Related(Base):
    __tablename__ = 'related'
    id = Column(Integer, primary_key=True)
    child_id = Column(Integer, ForeignKey('child_objs.id'))
    num = Column(Integer)

Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)

session = Session()

c1 = Child(columnA='some1')
c2 = Child(columnA='some2')

rc1 = Related(num=1)
rc2 = Related(num=2)
c1.related.append(rc1)
c2.related.append(rc2)

session.add(c1)
session.add(c2)
session.add(rc1)
session.add(rc2)
session.commit()

expr = func.sum(Related.num)
query = select([expr])

query = query.select_from(inspect(Related).selectable)
query.correlate(Child)

condition = join(
    Child.related.parent,
    Related,
    Child.related
).onclause
query.where(condition)
print("\n\n=================\n\n"+str(query)+"\n")

update_query = Child.__table__.update().values({
    '_sum': query
}).where(Child.id.in_([1, 2]))

print("\n\n=================\n\n"+str(update_query)+"\n")
Dem0n3D commented 7 years ago

This bug is still exists :(