kvesteri / sqlalchemy-continuum

Versioning extension for SQLAlchemy.
BSD 3-Clause "New" or "Revised" License
575 stars 127 forks source link

SAWarning: SELECT statement has a cartesian product between FROM element(s) #305

Open indiVar0508 opened 2 years ago

indiVar0508 commented 2 years ago

Getting this warning when running pytest tests/relationships/test_non_versioned_classes.py::TestManyToManyRelationshipToNonVersionedClass::test_single_insert

sqlalchemy_continuum/relationship_builder.py:86: SAWarning: SELECT statement has a cartesian product between FROM element(s) "tag" and FROM element "article_tag_version".  Apply join condition(s) between each element to resolve.
    return query.all()

I created a normal py-script from testfile to replicate the issue

import sqlalchemy as sa
from sqlalchemy.orm import declarative_base, sessionmaker, configure_mappers
from sqlalchemy_continuum import make_versioned

make_versioned(user_cls=None)
Base = declarative_base()

class Article(Base):
    __tablename__ = 'article'
    __versioned__ = {
        'base_classes': (Base, )
    }
    id = sa.Column(sa.Integer, autoincrement=True, primary_key=True)
    name = sa.Column(sa.Unicode(255))

article_tag = sa.Table(
    'article_tag',
    Base.metadata,
    sa.Column(
        'article_id',
        sa.Integer,
        sa.ForeignKey('article.id'),
        primary_key=True,
    ),
    sa.Column(
        'tag_id',
        sa.Integer,
        sa.ForeignKey('tag.id'),
        primary_key=True
    )
)

class Tag(Base):
    __tablename__ = 'tag'

    id = sa.Column(sa.Integer, autoincrement=True, primary_key=True)
    name = sa.Column(sa.Unicode(255))

Tag.articles = sa.orm.relationship(
    Article,
    secondary=article_tag,
    backref='tags'
)
configure_mappers()
engine = sa.create_engine('sqlite:///:memory:', echo=True)
Base.metadata.create_all(engine)

session = sessionmaker(bind=engine)()

article = Article()
article.name = u'Some article'
article.content = u'Some content'
tag = Tag(name=u'some tag')
article.tags.append(tag)
session.add(article)
session.commit()
article.versions[0].tags

Logs:

2022-09-03 11:56:16,131 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-09-03 11:56:16,132 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("article")
2022-09-03 11:56:16,132 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,132 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("article")
2022-09-03 11:56:16,132 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,133 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("article_tag")
2022-09-03 11:56:16,133 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,133 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("article_tag")
2022-09-03 11:56:16,133 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,133 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("tag")
2022-09-03 11:56:16,133 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,134 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("tag")
2022-09-03 11:56:16,134 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,134 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("article_version")
2022-09-03 11:56:16,134 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,134 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("article_version")
2022-09-03 11:56:16,134 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,135 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("transaction")
2022-09-03 11:56:16,135 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,135 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("transaction")
2022-09-03 11:56:16,135 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,135 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("article_tag_version")
2022-09-03 11:56:16,135 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,136 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("article_tag_version")
2022-09-03 11:56:16,136 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-09-03 11:56:16,137 INFO sqlalchemy.engine.Engine 
CREATE TABLE article (
        id INTEGER NOT NULL, 
        name VARCHAR(255), 
        PRIMARY KEY (id)
)

2022-09-03 11:56:16,137 INFO sqlalchemy.engine.Engine [no key 0.00018s] ()
2022-09-03 11:56:16,138 INFO sqlalchemy.engine.Engine 
CREATE TABLE tag (
        id INTEGER NOT NULL, 
        name VARCHAR(255), 
        PRIMARY KEY (id)
)

2022-09-03 11:56:16,138 INFO sqlalchemy.engine.Engine [no key 0.00015s] ()
2022-09-03 11:56:16,139 INFO sqlalchemy.engine.Engine 
CREATE TABLE article_version (
        id INTEGER NOT NULL, 
        name VARCHAR(255), 
        transaction_id INTEGER NOT NULL, 
        end_transaction_id INTEGER, 
        operation_type SMALLINT NOT NULL, 
        PRIMARY KEY (id, transaction_id)
)

2022-09-03 11:56:16,139 INFO sqlalchemy.engine.Engine [no key 0.00017s] ()
2022-09-03 11:56:16,139 INFO sqlalchemy.engine.Engine CREATE INDEX ix_article_version_transaction_id ON article_version (transaction_id)
2022-09-03 11:56:16,140 INFO sqlalchemy.engine.Engine [no key 0.00016s] ()
2022-09-03 11:56:16,140 INFO sqlalchemy.engine.Engine CREATE INDEX ix_article_version_end_transaction_id ON article_version (end_transaction_id)
2022-09-03 11:56:16,140 INFO sqlalchemy.engine.Engine [no key 0.00013s] ()
2022-09-03 11:56:16,141 INFO sqlalchemy.engine.Engine CREATE INDEX ix_article_version_operation_type ON article_version (operation_type)
2022-09-03 11:56:16,141 INFO sqlalchemy.engine.Engine [no key 0.00014s] ()
2022-09-03 11:56:16,142 INFO sqlalchemy.engine.Engine 
CREATE TABLE "transaction" (
        issued_at DATETIME, 
        id INTEGER NOT NULL, 
        remote_addr VARCHAR(50), 
        PRIMARY KEY (id)
)

2022-09-03 11:56:16,142 INFO sqlalchemy.engine.Engine [no key 0.00018s] ()
2022-09-03 11:56:16,143 INFO sqlalchemy.engine.Engine 
CREATE TABLE article_tag_version (
        article_id INTEGER NOT NULL, 
        tag_id INTEGER NOT NULL, 
        transaction_id INTEGER NOT NULL, 
        end_transaction_id INTEGER, 
        operation_type SMALLINT NOT NULL, 
        PRIMARY KEY (article_id, tag_id, transaction_id)
)

2022-09-03 11:56:16,143 INFO sqlalchemy.engine.Engine [no key 0.00016s] ()
2022-09-03 11:56:16,143 INFO sqlalchemy.engine.Engine CREATE INDEX ix_article_tag_version_end_transaction_id ON article_tag_version (end_transaction_id)
2022-09-03 11:56:16,144 INFO sqlalchemy.engine.Engine [no key 0.00015s] ()
2022-09-03 11:56:16,144 INFO sqlalchemy.engine.Engine CREATE INDEX ix_article_tag_version_operation_type ON article_tag_version (operation_type)
2022-09-03 11:56:16,144 INFO sqlalchemy.engine.Engine [no key 0.00013s] ()
2022-09-03 11:56:16,145 INFO sqlalchemy.engine.Engine CREATE INDEX ix_article_tag_version_transaction_id ON article_tag_version (transaction_id)
2022-09-03 11:56:16,145 INFO sqlalchemy.engine.Engine [no key 0.00014s] ()
2022-09-03 11:56:16,146 INFO sqlalchemy.engine.Engine 
CREATE TABLE article_tag (
        article_id INTEGER NOT NULL, 
        tag_id INTEGER NOT NULL, 
        PRIMARY KEY (article_id, tag_id), 
        FOREIGN KEY(article_id) REFERENCES article (id), 
        FOREIGN KEY(tag_id) REFERENCES tag (id)
)

2022-09-03 11:56:16,146 INFO sqlalchemy.engine.Engine [no key 0.00017s] ()
2022-09-03 11:56:16,146 INFO sqlalchemy.engine.Engine COMMIT
2022-09-03 11:56:16,147 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-09-03 11:56:16,151 INFO sqlalchemy.engine.Engine INSERT INTO "transaction" (issued_at, remote_addr) VALUES (?, ?)
2022-09-03 11:56:16,151 INFO sqlalchemy.engine.Engine [generated in 0.00042s] ('2022-09-03 06:26:16.151053', None)
2022-09-03 11:56:16,153 INFO sqlalchemy.engine.Engine INSERT INTO article (name) VALUES (?)
2022-09-03 11:56:16,154 INFO sqlalchemy.engine.Engine [generated in 0.00034s] ('Some article',)
2022-09-03 11:56:16,155 INFO sqlalchemy.engine.Engine INSERT INTO tag (name) VALUES (?)
2022-09-03 11:56:16,155 INFO sqlalchemy.engine.Engine [generated in 0.00031s] ('some tag',)
2022-09-03 11:56:16,157 INFO sqlalchemy.engine.Engine INSERT INTO article_tag (article_id, tag_id) VALUES (?, ?)
2022-09-03 11:56:16,157 INFO sqlalchemy.engine.Engine [generated in 0.00061s] (1, 1)
2022-09-03 11:56:16,159 INFO sqlalchemy.engine.Engine INSERT INTO article_tag_version (article_id, tag_id, transaction_id, operation_type) VALUES (?, ?, ?, ?)
2022-09-03 11:56:16,159 INFO sqlalchemy.engine.Engine [generated in 0.00032s] (1, 1, 1, 0)
2022-09-03 11:56:16,166 INFO sqlalchemy.engine.Engine UPDATE article_version SET end_transaction_id=? WHERE article_version.transaction_id = (SELECT max(article_version_1.transaction_id) AS max_1 
FROM article_version AS article_version_1 
WHERE article_version_1.transaction_id < ? AND article_version_1.id = ?) AND article_version.id = ?
2022-09-03 11:56:16,166 INFO sqlalchemy.engine.Engine [generated in 0.00041s] (1, 1, 1, 1)
2022-09-03 11:56:16,168 INFO sqlalchemy.engine.Engine INSERT INTO article_version (id, name, transaction_id, end_transaction_id, operation_type) VALUES (?, ?, ?, ?, ?)
2022-09-03 11:56:16,169 INFO sqlalchemy.engine.Engine [generated in 0.00037s] (1, 'Some article', 1, None, 0)
2022-09-03 11:56:16,170 INFO sqlalchemy.engine.Engine COMMIT
2022-09-03 11:56:16,171 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-09-03 11:56:16,177 INFO sqlalchemy.engine.Engine SELECT article.id AS article_id, article.name AS article_name 
FROM article 
WHERE article.id = ?
2022-09-03 11:56:16,177 INFO sqlalchemy.engine.Engine [generated in 0.00038s] (1,)
2022-09-03 11:56:16,178 INFO sqlalchemy.engine.Engine SELECT article_version.id AS article_version_id, article_version.name AS article_version_name, article_version.transaction_id AS article_version_transaction_id, article_version.end_transaction_id AS article_version_end_transaction_id, article_version.operation_type AS article_version_operation_type 
FROM article_version 
WHERE ? = article_version.id ORDER BY article_version.transaction_id
 LIMIT ? OFFSET ?
2022-09-03 11:56:16,178 INFO sqlalchemy.engine.Engine [generated in 0.00363s] (1, 1, 0)
/home/indivar/github/sqlalchemy-continuum/sqlalchemy_continuum/relationship_builder.py:86: SAWarning: SELECT statement has a cartesian product between FROM element(s) "article_tag_version" and FROM element "tag".  Apply join condition(s) between each element to resolve.
  return query.all()
2022-09-03 11:56:16,182 INFO sqlalchemy.engine.Engine SELECT tag.id AS tag_id, tag.name AS tag_name 
FROM tag, article_tag_version 
WHERE ? = article_tag_version.article_id
2022-09-03 11:56:16,182 INFO sqlalchemy.engine.Engine [generated in 0.00036s] (1,)

This seem to be to happening only for SQLA > 1.4

indiVar0508 commented 2 years ago

This seem to be happening due to a change in SQLA from this thread, not sure i tried to look into it, might be happening due expression_reflector but not sure how , i can be completely wrong

Edit: I think issue lies in criteria method of relationship builder for MANYTOMANY direction for non-versioned object(s),

jbhanks commented 11 months ago

I think I am probably experiencing the same thing. I get the warning but as far as I can tell it isn't a problem as the output is exactly what I intended.

The schema:


class MediaType(Base):
    __tablename__ = "media_types"
    media_type: Mapped[str] = mapped_column(init=True, primary_key=True)
    file_format: Mapped[str] = mapped_column(
        default=None, unique=True, nullable=True
    )

class Collection(Base):
    __tablename__ = "collections"
    collection_id: Mapped[int] = mapped_column(
        init=False, primary_key=True, autoincrement=True
    )
    collection_name: Mapped[str] = mapped_column(default=None)
    collection_description: Mapped[str] = mapped_column(
        default=None, unique=False, nullable=True
    )
    media_type_id: Mapped[str] = mapped_column(
         "media_type",
        ForeignKey("media_types.media_type"), default=None
    )
    media_type: Mapped[MediaType] = relationship(init=False)
    collection_ezname: Mapped[str] = mapped_column(Computed(func.LOWER(func.REGEXP_REPLACE((collection_name), '[ /%<>#";?:@&=+$,.]', '_', 'g'))), unique=True, init=False)
    tags: Mapped[list]  = mapped_column(ARRAY(TEXT, dimensions=1), default=None, nullable=True)
    date_added: Mapped[datetime.datetime] = mapped_column(default=None, nullable=True)

    __mapper_args__ = {
        "polymorphic_identity": "collections",
        "eager_defaults": True,
    }

class Post(Base):
    __tablename__ = "posts"
    id: Mapped[int] = mapped_column(init=False, primary_key=True)
    title: Mapped[str] = mapped_column(default=None, nullable=True)
    description: Mapped[str] = mapped_column(default=None, nullable=True)

    collection_id: Mapped[int] = mapped_column(
        ForeignKey("collections.collection_id"), init=False
    )
    collection: Mapped[Collection] = relationship(kw_only=True)

    media_type: Mapped[str] = mapped_column(
        ForeignKey("media_types.media_type"), init=False
    )
    media_type_reference: Mapped[MediaType] = relationship(init=False)
    tags: Mapped[list]  = mapped_column(ARRAY(TEXT, dimensions=1), default=None, nullable=True)
    views: Mapped[int] = mapped_column(default=0, nullable=True)
    social_media: Mapped[dict] =mapped_column(JSONB, default=None, nullable=True)
    date_added: Mapped[datetime.datetime] = mapped_column(default=None, nullable=True)
    date_modified: Mapped[datetime.datetime] = mapped_column(default=None, nullable=True)
    rss_pub_date: Mapped[str] = mapped_column(default=None, nullable=True)
    added_to_rss: Mapped[bool] = mapped_column(default=False, nullable=True)

    __mapper_args__ = {
        "polymorphic_identity": "posts",
        "polymorphic_on": "media_type",
        "eager_defaults": True,
    }

class Link(Post):
    __tablename__ = "links"
    id: Mapped[int] = mapped_column(
        ForeignKey("posts.id"), init=False, primary_key=True
    )
    url: Mapped[str] = mapped_column(default=None, unique=True)
    other_info: Mapped[str] = mapped_column(default=None, nullable=True)
    clicks: Mapped[int] = mapped_column(default=0, nullable=True)

    __mapper_args__ = {
        "polymorphic_identity": "links",
        "eager_defaults": True,
    }

The line that makes the error:

    link_collections = db.session.query(Collection).where(MediaType.media_type == "links").all()
    print(link_collections)

The output (which, I stress, was the output I desired):

[Collection(collection_id=1, collection_name='Essentials', collection_description='', media_type_id='links', media_type=MediaType(media_type='links', file_format=None), collection_ezname='essentials', tags=None, date_added=None), Collection(collection_id=2, collection_name='NYC', collection_description='', media_type_id='links', media_type=MediaType(media_type='links', file_format=None), collection_ezname='nyc', tags=None, date_added=None), Collection(collection_id=3, collection_name='Amazing Art', collection_description='', media_type_id='links', media_type=MediaType(media_type='links', file_format=None), collection_ezname='amazing_art', tags=None, date_added=None), 
....
]
stv8 commented 2 months ago

I'm also getting this same warning, however its causing me troubles as it's returning ALL the models on one side of the many-to-many relationship.

For context I'm integrating into an existing project, I made one class versioned which made version tables for all of the many to many association tables.

Whenever I do video.versions[1].tags I get ALL of the tags that exist

sqlalchemy_continuum/relationship_builder.py:81: SAWarning: SELECT statement has a cartesian product between FROM element(s) "tags" and FROM element "video_tags_version".  Apply join condition(s) between each element to resolve.
  return query.all()

I'm on sqlalchemy 1.4.42 and continuum 1.4.2

stv8 commented 2 months ago

Update, I think I found the problem, indeed as @indiVar0508 mentioned, for context my Video class is versioned and my Tag class is not.

I think issue lies in criteria method of relationship builder for MANYTOMANY direction for non-versioned object(s),

The issue is that the join is only using the primaryjoin for all of the types of relationships while many-to-many setups can have a seocndaryjoin component as well. This is causing the cartesian product.

Example from my code

class Video(Base):
    # truncated class for brevity
    tags: Mapped[list["Tag"]] = relationship(
        "Tag",
        secondary=video_tags_association,
        back_populates="videos",
        primaryjoin="video_tags.c.videoid == Video.id",
        secondaryjoin="and_(video_tags.c.tag_id == Tag.id, Tag.deleted_at.is_(None))",
    )

and the reflector is only using primaryjoin

https://github.com/kvesteri/sqlalchemy-continuum/blob/91730be62d7d3e15a8c69e57d5b5aa1ecd82f1f4/sqlalchemy_continuum/relationship_builder.py#L95

I got it to work by combining the conditions into the primaryjoin

primaryjoin="and_(video_tags.c.videoid == Video.id, video_tags.c.tag_id == Tag.id, Tag.deleted_at.is_(None))"

but it would be great if the logic would consider secondaryjoins as SQLAlchemy recommends using primaryjoin + secondaryjoin when joining across multiple tables.