pg dialect when rendering index expressions needs to apply self_group() the same way core DDL compiler does

MIrinkov commented 12 months ago

Describe the bug Alembic fails to generate a syntactically correct index migration when running alembic revision --autogenerate when creating an index on a field in a JSONB column using format like sqlalchemy.Index("idx_meta_file_id", meta["file_id"].astext) (where meta is a column in a model class, meta: Mapped[dict] = mapped_column(sqlalchemy.dialects.postgresql.JSONB)).

The constructed migration operations are:

    op.create_table(
        "test_table",
        sa.Column("id", sa.Integer(), nullable=False),
        sa.Column(
            "meta", postgresql.JSONB(astext_type=sa.Text()), nullable=False
        ),
        sa.PrimaryKeyConstraint("id"),
    )
    op.create_index(
        "idx_meta_file_id",
        "test_table",
        [sa.text("meta ->> 'file_id'")],
        unique=False,
    )

Running those with alembic upgrade head results in

sqlalchemy.exc.ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.PostgresSyntaxError'>: syntax erro
r at or near "->>"
[SQL: CREATE INDEX idx_meta_file_id ON test_table (meta ->> 'file_id')]

Expected behavior The [sa.text("meta ->> 'file_id'")], is missing parentheses, it is supposed to be [sa.text("(meta ->> 'file_id')")], because the correct postgres syntax for this index is

-- notice the double parentheses
CREATE INDEX idx_meta_file_id ON test_table ((meta ->> 'file_id'));

I can currently achieve this by replacing the .astext version with

sqlalchemy.Index("idx_meta_file_id", sqlalchemy.text("(meta ->> 'file_id')")) # Notice the parentheses withing the text expression

My question is - is this a known bug? Did I miss some configuration that's required for dialect JSONB to work properly? Or am I expected to use text() expressions in such cases?

To Reproduce

from sqlalchemy.ext.asyncio import AsyncAttrs
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
import sqlalchemy.dialects.postgresql

class Base(AsyncAttrs, DeclarativeBase):
    pass

class TestTable(Base):
    __tablename__ = "test_table"
    id: Mapped[int] = mapped_column(primary_key=True)
    meta: Mapped[dict] = mapped_column(sqlalchemy.dialects.postgresql.JSONB)
    __table_args__ = (
        sqlalchemy.Index("idx_meta_file_id", meta["file_id"].astext),
        # # The version below actually works.
        # sqlalchemy.Index("idx_meta_file_id", sqlalchemy.text("(meta ->> 'file_id')")),
    )

Then run alembic revision --autogenerate, and the results will be as stated above - the required "extra" parentheses will be missing. Running alembic upgrade head will then result in the following error.

Error

(venv) PS C:\Users\User\project\migrations> alembic upgrade head
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.        
INFO  [alembic.runtime.migration] Running upgrade  -> 2614768f9f92, test
Traceback (most recent call last):
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 528, in _prepare_and_execute
    prepared_stmt, attributes = await adapt_connection._prepare(
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 778, in _prepare
    prepared_stmt = await self._connection.prepare(
  File "C:\Users\User\project\venv\lib\site-packages\asyncpg\connection.py", line 565, in prepare
    return await self._prepare(
  File "C:\Users\User\project\venv\lib\site-packages\asyncpg\connection.py", line 583, in _prepare
    stmt = await self._get_statement(
  File "C:\Users\User\project\venv\lib\site-packages\asyncpg\connection.py", line 397, in _get_statement
    statement = await self._protocol.prepare(
  File "asyncpg\protocol\protocol.pyx", line 168, in prepare
asyncpg.exceptions.PostgresSyntaxError: syntax error at or near "->>"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\engine\default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 585, in execute
    self._adapt_connection.await_(
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 125, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 185, in greenlet_spawn
    value = await result
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 564, in _prepare_and_execute
    self._handle_exception(error)
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 515, in _handle_exception
    self._adapt_connection._handle_exception(error)
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 802, in _handle_exception
    raise translated_error from error
sqlalchemy.dialects.postgresql.asyncpg.AsyncAdapt_asyncpg_dbapi.ProgrammingError: <class 'asyncpg.exceptions.PostgresSyntaxError'>: syntax error at or near "->>"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\User\project\venv\Scripts\alembic.exe\__main__.py", line 7, in <module>
  File "C:\Users\User\project\venv\lib\site-packages\alembic\config.py", line 630, in main
    CommandLine(prog=prog).main(argv=argv)
  File "C:\Users\User\project\venv\lib\site-packages\alembic\config.py", line 624, in main
    self.run_cmd(cfg, options)
  File "C:\Users\User\project\venv\lib\site-packages\alembic\config.py", line 601, in run_cmd
    fn(
  File "C:\Users\User\project\venv\lib\site-packages\alembic\command.py", line 399, in upgrade
    script.run_env()
  File "C:\Users\User\project\venv\lib\site-packages\alembic\script\base.py", line 578, in run_env
    util.load_python_file(self.dir, "env.py")
  File "C:\Users\User\project\venv\lib\site-packages\alembic\util\pyfiles.py", line 93, in load_python_file
    module = load_module_py(module_id, path)
  File "C:\Users\User\project\venv\lib\site-packages\alembic\util\pyfiles.py", line 109, in load_module_py
    spec.loader.exec_module(module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Users\User\project\migrations\.\env.py", line 89, in <module>
    run_migrations_online()
  File "C:\Users\User\project\migrations\.\env.py", line 83, in run_migrations_online
    asyncio.run(run_async_migrations())
  File "C:\Program Files\Python310\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Program Files\Python310\lib\asyncio\base_events.py", line 641, in run_until_complete
    return future.result()
  File "C:\Users\User\project\migrations\.\env.py", line 75, in run_async_migrations
    await connection.run_sync(do_run_migrations)
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\ext\asyncio\engine.py", line 884, in run_sync
    return await greenlet_spawn(fn, self._proxied, *arg, **kw)
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 192, in greenlet_spawn
    result = context.switch(value)
  File "C:\Users\User\project\migrations\.\env.py", line 59, in do_run_migrations
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "C:\Users\User\project\venv\lib\site-packages\alembic\runtime\environment.py", line 937, in run_migrations
    self.get_context().run_migrations(**kw)
  File "C:\Users\User\project\venv\lib\site-packages\alembic\runtime\migration.py", line 624, in run_migrations
    step.migration_fn(**kw)
  File "C:\Users\User\project\migrations\versions\2614768f9f92_test.py", line 74, in upgrade
    op.create_index(
  File "<string>", line 8, in create_index
  File "<string>", line 3, in create_index
  File "C:\Users\User\project\venv\lib\site-packages\alembic\operations\ops.py", line 994, in create_index
    return operations.invoke(op)
  File "C:\Users\User\project\venv\lib\site-packages\alembic\operations\base.py", line 393, in invoke
    return fn(self, operation)
  File "C:\Users\User\project\venv\lib\site-packages\alembic\operations\toimpl.py", line 105, in create_index
    operations.impl.create_index(idx, **kw)
  File "C:\Users\User\project\venv\lib\site-packages\alembic\ddl\postgresql.py", line 94, in create_index
    self._exec(CreateIndex(index, **kw))
  File "C:\Users\User\project\venv\lib\site-packages\alembic\ddl\impl.py", line 193, in _exec
    return conn.execute(  # type: ignore[call-overload]
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1412, in execute
    return meth(
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\sql\ddl.py", line 181, in _execute_on_connection
    return connection._execute_ddl(
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1524, in _execute_ddl
    ret = self._execute_context(
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1844, in _execute_context
    return self._exec_single_context(
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1984, in _exec_single_context
    self._handle_dbapi_exception(
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\engine\base.py", line 2339, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\engine\default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 585, in execute
    self._adapt_connection.await_(
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 125, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 185, in greenlet_spawn
    value = await result
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 564, in _prepare_and_execute
    self._handle_exception(error)
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 515, in _handle_exception
    self._adapt_connection._handle_exception(error)
  File "C:\Users\User\project\venv\lib\site-packages\sqlalchemy\dialects\postgresql\asyncpg.py", line 802, in _handle_exception
    raise translated_error from error
sqlalchemy.exc.ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.PostgresSyntaxError'>: syntax error at or near "->>"
[SQL: CREATE INDEX idx_meta_file_id ON test_table (meta ->> 'file_id')]
(Background on this error at: https://sqlalche.me/e/20/f405)

Versions.

OS: Win 10
Python: 3.10.1
Alembic: 1.12.0
SQLAlchemy: 2.0.21
Database: PostgreSQL 15.4 (Debian 15.4-1.pgdg120+1) on x86_64-pc-linux-gnu
DBAPI:

Have a nice day!

zzzeek commented 12 months ago

it's a bit strange because a plain create table does do the parenthesis in that weird "double" way

from sqlalchemy.orm import declarative_base
from sqlalchemy import Column, Integer, Text, Index
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy import create_engine

Base = declarative_base()

class Thing(Base):
    __tablename__ = 'thing'

    id = Column(Integer, primary_key=True)
    data = Column(JSONB(astext_type=Text()))

    __table_args__ = (
        Index("idx_1", data["file_id"].astext),
    )

e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
Base.metadata.drop_all(e)
Base.metadata.create_all(e)

CREATE TABLE thing (
    id SERIAL NOT NULL, 
    data JSONB, 
    PRIMARY KEY (id)
)

2023-10-12 13:01:42,808 INFO sqlalchemy.engine.Engine [no key 0.00063s] {}
2023-10-12 13:01:42,835 INFO sqlalchemy.engine.Engine CREATE INDEX idx_1 ON thing ((data ->> 'file_id'))

obviously the text rendering for alembic is trying to normalize parens out and is failing for some reason here

zzzeek commented 12 months ago

OK well the PG dialect in SQLAlchemy just calls self_group() on these expressions. so you can work around like this:

class TestTable(Base):
    __tablename__ = "test_table"
    id: Mapped[int] = mapped_column(primary_key=True)
    meta: Mapped[dict] = mapped_column(sqlalchemy.dialects.postgresql.JSONB)
    __table_args__ = (
        sqlalchemy.Index("idx_meta_file_id", meta["file_id"].astext.self_group()),
    )

that should resolve the issue here

zzzeek commented 12 months ago

the thing is , autogenerate on the second run will fail anyway as we dont do a good job comparing these indexes, so you'd really need to define the text like this probably:

sa.text('(data ->> \'$."file_id"\'::text)')

sqla-tester commented 12 months ago

Mike Bayer has proposed a fix for this issue in the main branch:

apply PG ddl paren rules to index expressions https://gerrit.sqlalchemy.org/c/sqlalchemy/alembic/+/4921

CaselIT commented 11 months ago

the thing is , autogenerate on the second run will fail anyway as we dont do a good job comparing these indexes, so you'd really need to define the text like this probably:
sa.text('(data ->> \'$."file_id"\'::text)')

so pg returns these like that? wow that's unfortunate.

Do you want to add a rule to ignore any index that contains an expression with -> and ->>?

sqlalchemy / alembic

pg dialect when rendering index expressions needs to apply self_group() the same way core DDL compiler does #1322