xzkostyan / clickhouse-sqlalchemy

ClickHouse dialect for SQLAlchemy
https://clickhouse-sqlalchemy.readthedocs.io
Other
412 stars 117 forks source link

LIMIT BY & FINAL are lost in SQLAlchemy 1.4 #198

Open nolar opened 1 year ago

nolar commented 1 year ago

First of all, thanks for the library! It is really helpful and easy to use!

Describe the bug

When using .limit_by(...) with SQLAclhemy 1.4, it is lost in the final SQL query.

To Reproduce

from clickhouse_sqlalchemy import engines, types, get_declarative_base, make_session
from sqlalchemy import create_engine, MetaData, Column, String, Table
from sqlalchemy import Column, PrimaryKeyConstraint

url = 'clickhouse+native://default:default@clickhouse:9000'
engine = create_engine(url, echo=True)
metadata = MetaData(bind=engine)
ClickHouseBase = get_declarative_base(metadata=metadata)

class MyTable(ClickHouseBase):
    fld = Column(types.UInt32())
    __tablename__ = "mytable"
    __table_args__ = (
        PrimaryKeyConstraint("fld"),
        engines.MergeTree(order_by=["fld"]),
    )

ch = make_session(engine)#.__enter__()

q = ch.query(MyTable.fld).limit_by([MyTable.fld], 1)
print(str(q))

Expected behavior

The LIMIT BY clause should be rendered in the SQL:

SELECT mytable.fld AS mytable_fld FROM mytable LIMIT 1 BY mytable.fld

Observed in SA1.4:

SELECT mytable.fld AS mytable_fld FROM mytable

Observed in SA1.3 (as expected):

SELECT mytable.fld AS mytable_fld FROM mytable LIMIT %(param_1)s BY mytable.fld

Versions

Versions for SA1.4 (where it fails):

pip install clickhouse_sqlalchemy==0.2.2 sqlalchemy==1.4.40

Versions for SA1.3 (where it worked):

pip install clickhouse_sqlalchemy==0.1.10 sqlalchemy==1.3.10

Python 3.9.7.

nolar commented 1 year ago

PS: Using select() gives the same results:

from clickhouse_sqlalchemy import select
q = select(MyTable.fld).limit_by([MyTable.fld], 1)
print(str(q))
# SELECT mytable.fld FROM mytable
nolar commented 1 year ago

I am not a big expert on SQLAlchemy's internals. But if that would be helpful, the loss of the LimitByClause happens in ORMSelectCompileState._setup_for_generate(): the incoming self.select_statement (of type Select, not Query) contains the LimitByClause, the outgoing self.statement does not contain it — all during a visit_select() call.

Looking at how the new Select is constructed in ORMSelectCompileState._simple_statement(), it seems it only passes specific clauses, not the custom ones.

I guess there should be a method or a hook to intercept and modify to include the limit_by=…, but I could not find any good place for that :-(

nolar commented 1 year ago

To the extent of my SQLAlchemy incompetence, this dirty hack "solves" the problem. Though I have no idea why it works and how it works, I just randomly added a few pieces together. I also have no idea what to do with it next — it might be very specific for my environment setup.

from sqlalchemy.orm.context import ORMSelectCompileState
from sqlalchemy.sql.base import CompileState

@CompileState.plugin_for("orm", "select")
class ClickHouseORMSelectCompileState(ORMSelectCompileState):
    """
    A dirty hack to fix the SQLAlchemy 1.4 + ClickHouse ORM integration.

    Read more: https://github.com/xzkostyan/clickhouse-sqlalchemy/issues/198
    """
    @classmethod
    def create_for_statement(cls, statement, compiler, **kw):
        state = super().create_for_statement(statement, compiler, **kw)

        # Also, maybe: _with_totals, _array_join, sample_clause(_sample).
        # See: ``clickhouse_sqlalchemy.orm.query:Query._compile_context()``
        for key, clause_key in (('_final', '_final_clause'), ('_limit_by', '_limit_by_clause')):
            if key in statement.__dict__:  # for session.query(…).final().limit_by(…)
                state.statement.__dict__[clause_key] = statement.__dict__[key]
            if clause_key in statement.__dict__:  # for select(…).final().limit_by(…)
                state.statement.__dict__[clause_key] = statement.__dict__[clause_key]

        return state

q = ch.query(MyTable.fld).final().limit_by([MyTable.fld], 1)
print(str(q))
# SELECT mytable.fld AS mytable_fld
# FROM mytable
# FINAL LIMIT %(param_1)s BY mytable.fld

q = select(MyTable.fld).final().limit_by([MyTable.fld], 1)
print(str(q))
# SELECT mytable.fld
# FROM mytable
# FINAL LIMIT %(param_1)s BY mytable.fld
gtoonstra commented 1 year ago

What is passed to the compiler is not actually the query object, not sure if that changed:

https://github.com/sqlalchemy/sqlalchemy/blob/6f75807063771496a34b7725d2565acf2528d76f/lib/sqlalchemy/orm/query.py#L2719

It is a statement object that is passed into the clickhouse compiler object. Somewhere in SQLAlchemy 1.4, some copy is made, losing clickhouse specific attributes:

https://github.com/xzkostyan/clickhouse-sqlalchemy/blob/b60c62488ea52068a85e32cb7828238cd80c0cc5/clickhouse_sqlalchemy/drivers/compilers/sqlcompiler.py#L234

attributes on the clickhouse Query object are set here:

https://github.com/xzkostyan/clickhouse-sqlalchemy/blob/b60c62488ea52068a85e32cb7828238cd80c0cc5/clickhouse_sqlalchemy/orm/query.py#L64

The str() implementation of the sqlalchemy Query object:

https://github.com/sqlalchemy/sqlalchemy/blob/6f75807063771496a34b7725d2565acf2528d76f/lib/sqlalchemy/orm/query.py#L2718

Not sure if str(q) is used anywhere, but it could be the key to resolving it. Probably also means that none of the other clickhouse specific statements work.

gtoonstra commented 1 year ago

What ends up happening:

Probably in SQLAlchemy 1.4, this is needed:

https://github.com/sqlalchemy/sqlalchemy/blob/cc72081b0c32dbd089fb9601747f448b65414640/lib/sqlalchemy/sql/compiler.py#L3780

Looking into that. Hopefully I can raise a PR for it.

gtoonstra commented 1 year ago

@xzkostyan : Do you have any idea? There are definitely some changes introduced in 1.4 that break the internal SQL Compiler.

This file no longer overrides select create:

https://github.com/xzkostyan/clickhouse-sqlalchemy/blob/b60c62488ea52068a85e32cb7828238cd80c0cc5/clickhouse_sqlalchemy/sql/selectable.py#L66

maybe because they started using different methods for that now:

https://github.com/sqlalchemy/sqlalchemy/blob/479dbc99e7fc5a60f846992c0cca8542047a8933/lib/sqlalchemy/orm/query.py#L579

Tried overriding that, but didn't work.

gtoonstra commented 1 year ago

@xzkostyan Do you have any idea, or are we using this wrong?

limony-city commented 6 months ago

Any updates? Still there in SQLAclhemy 2.0