sqlalchemy / alembic

A database migrations tool for SQLAlchemy.
MIT License
2.85k stars 247 forks source link

Autogenerate renders TypeDecorator instance instead of underlying impl type #1386

Open saifelse opened 10 months ago

saifelse commented 10 months ago

Describe the bug

This isn't a bug per se, but a small improvement for autogenerate when using TypeDecorator.

When a TypeDecorator is used in a column definition, e.g.:

"""
File: app/models/foo.py
"""
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.types import TypeDecorator
...

class JSONBData(TypeDecorator):
   impl = JSONB

foo = Table("foo", MetaData(), Column("data", JSONBData))

The auto-generated migration ends up referencing the TypeDecorator:

op.add_column("foo", sa.Column("data", app.models.foo.JSONBData(), nullable=True))

which is annoying for two reasons:

  1. The import is not automatically rendered.
  2. The migration file has an unnecessary dependency on app, e.g. if the app/models/foo.py is refactored, we may need to update this migration file... when that could have been avoided if instead of rendering app.models.foo.JSONBData, alembic directly rendered the underlying impl: postgresql.JSONB.

I'm not sure if there are any scenarios where it is actually preferable to have the TypeDecorator in the autogenerated file. If there are use cases for it, would it be sensible to make this a config option instead of unconditional?

Expected behavior Ideally, we'd generate the same thing as when foo = Table("foo", MetaData(), Column("data", JSONB)) is provided, i.e.:

op.add_column("foo", sa.Column("data", postgresql.JSONB(astext_type=Text()), nullable=True))

To Reproduce

Test case:

diff --git a/tests/test_autogen_render.py b/tests/test_autogen_render.py
index eeeb92e..9755869 100644
--- a/tests/test_autogen_render.py
+++ b/tests/test_autogen_render.py
@@ -33,6 +33,7 @@ from sqlalchemy.sql import false
 from sqlalchemy.sql import literal_column
 from sqlalchemy.sql import table
 from sqlalchemy.types import TIMESTAMP
+from sqlalchemy.types import TypeDecorator
 from sqlalchemy.types import UserDefinedType

 from alembic import autogenerate
@@ -1078,6 +1079,21 @@ class AutogenRenderTest(TestBase):
             "server_default='5', nullable=True))",
         )

+    def test_render_add_column_type_decorator(self):
+        self.autogen_context.opts["user_module_prefix"] = None
+
+        class MyType(TypeDecorator):
+            impl = Integer
+
+        op_obj = ops.AddColumnOp(
+            "foo", Column("x", MyType, server_default="5")
+        )
+        eq_ignore_whitespace(
+            autogenerate.render_op_text(self.autogen_context, op_obj),
+            "op.add_column('foo', sa.Column('x', sa.Integer(), "
+            "server_default='5', nullable=True))",
+        )
+
     @testing.emits_warning("Can't validate argument ")
     def test_render_add_column_custom_kwarg(self):
         col = Column(

Error

When running tox -e py-sqlalchemy -- tests/test_autogen_render.py with the above patch:

  File "/Users/saif/contrib/alembic/tests/test_autogen_render.py", line 1091, in test_render_add_column_type_decorator
    eq_ignore_whitespace(
  File "/Users/saif/contrib/alembic/alembic/testing/assertions.py", line 111, in eq_ignore_whitespace
    assert a == b, msg or "%r != %r" % (a, b)
           ^^^^^^
AssertionError: "op.add_column('foo', sa.Column('x', tests.test_autogen_render.MyType(), server_default='5', nullable=True))" != "op.add_column('foo', sa.Column('x', sa.Integer(), server_default='5', nullable=True))"

Versions.

Have a nice day!

CaselIT commented 10 months ago

Hi

I'm not sure if there are any scenarios where it is actually preferable to have the TypeDecorator in the autogenerated file. If there are use cases for it, would it be sensible to make this a config option instead of unconditional?

Well a type decorator can render a different type depending on the dialect, so this change if accepted would need to be made a config option. For example https://docs.sqlalchemy.org/en/20/core/custom_types.html#backend-agnostic-guid-type

Also at the moment alembic support customization of what type is rendered, documented here: https://alembic.sqlalchemy.org/en/latest/autogenerate.html#affecting-the-rendering-of-types-themselves

maybe an extension to that documentation to show an example function to render the impl of type decorator could be enough? @zzzeek opinion on this?

zzzeek commented 10 months ago

""" File: app/models/foo.py """ from sqlalchemy.dialects.postgresql import JSONB from sqlalchemy.types import TypeDecorator ...

class JSONBData(TypeDecorator): impl = JSONB

foo = Table("foo", MetaData(), Column("data", JSONBData))


The auto-generated migration ends up referencing the TypeDecorator:

```python
op.add_column("foo", sa.Column("data", app.models.foo.JSONBData(), nullable=True))

which is annoying for two reasons:

1. The import is not automatically rendered.

we have a configuration path for this which is to add your application's standard type import paths to the script.py.mako template. There's some other ways to add imports dynamically as well but are not as easy to document.

2. The migration file has an unnecessary dependency on `app`, e.g. if the app/models/foo.py is refactored, we may need to update this migration file... when that could have been avoided if instead of rendering `app.models.foo.JSONBData`, alembic directly rendered the underlying impl: `postgresql.JSONB`.

if you are refactoring your application to move where a particular symbol imports from, you necessarily have to change all the places that reference it, so I dont see why an alembic migration file is any different than any other file which refers to it.

I'm not sure if there are any scenarios where it is actually preferable to have the TypeDecorator in the autogenerated file.

there is, which is that the op.create_table() directive actually returns the Table object it just created so you can then use it for bulk_insert , where you definitely want your TypeDecorator to take place.

If there are use cases for it, would it be sensible to make this a config option instead of unconditional?

there already is! make yourself a render_item callable that does this for all typedecorators, I will gladly accept an add for cookbook for this:

def render_item(type_, obj, autogen_context):
    """Apply custom rendering for selected items."""

    if type_ == 'type' and isinstance(obj, TypeDecorator):
        return f"sa.{obj.impl!r}"

    # default rendering for other objects
    return False

def run_migrations_online():
    # ...

    context.configure(
                connection=connection,
                target_metadata=target_metadata,
                render_item=render_item,
                # ...
                )

    # ...
saifelse commented 10 months ago

Thanks @CaselIT and @zzzeek for the feedback!

For some additional context on where I'm coming from, I've already implemented a monkeypatch that is applying a change similar to the one I've proposed that we've been using for a couple of years now, so that for most part developers get correctly generated migrations... but at the cost of someone maintaining ugly monkeypatch code in sync with alembic's changelog, so I was hoping to fold this into alembic source code if it would make sense. It looks like there are already cases where folks want to preserve their type decorators in the migration, so finding the right configuration instead of monkey patching / broadly making the change seems sensible to me.

monkeypatch if you're curious ```py from alembic import context from alembic.autogenerate.render import _repr_type import sqlalchemy.types as types migration_context = context.get_context() old_render_type = migration_context.impl.render_type def new_render_type(type_, autogen_context): # If we are dealing with a custom TypeDecorator, run _repr_type on the underlying implementation. if isinstance(type_, types.TypeDecorator) and type(type_).__module__.startswith("app."): return _repr_type(type_.impl, autogen_context) # Otherwise, do the default behavior. return old_render_type(type_, autogen_context) migration_context.impl.render_type = new_render_type ```

But in the interest of properly communicating why this sort of recipe is useful, I'll respond to some of the things you both have mentioned 😇.

if you are refactoring your application to move where a particular symbol imports from, you necessarily have to change all the places that reference it, so I dont see why an alembic migration file is any different than any other file which refers to it.

That's generally true... for our project, we've decided to stricly enfoce that migrations are disjoint from app, which has some pretty big benefits:

There are many ways to combat the above issues, which I concur are not universal issues, but we found this decoupling to work for us.

Well a type decorator can render a different type depending on the dialect, so this change if accepted would need to be made a config option.

FWIW, where I put my proposed fix (#1387) happens after a dialect has attempted to render the type, so I think the mentioned use case still works - at least no test broke where I put my logic.

we have a configuration path for this which is to add your application's standard type import paths to the script.py.mako template. There's some other ways to add imports dynamically as well but are not as easy to document.

Good to know. In my monolith applicaiton, we have many places that a type decorator could live, so a standard set of imports isn't totally doable, and the overhead of custom code for adding dynamic imports sound like a heavier burden on the team, also not allowed by my teams' "migrations does not depend on app" requirement.

there already is! make yourself a render_item callable that does this for all typedecorators, I will gladly accept an add for cookbook for this:

Nice! I'll try this out, and attempt to write up a cookbook recipe.

saifelse commented 10 months ago

re: render_item, one thing to note is that with my proposed implementation when dealing with impl = JSONB case, we get the from sqlalchemy.dialects import postgresql import for free since it uses the existing machinery, whereas using render_item, I think I'll have to duplicate some of the logic from https://github.com/sqlalchemy/alembic/blob/abc8002ec67ddcb0a0be56b8167a4837f3884217/alembic/autogenerate/render.py#L815-L835 , which isn't the end of the world, just more complexity to maintain in userland.

One way of implementing render_item would be to do something very similar to the monkeypatch I shared above, where instead of adding in our custom logic, we directly invoke alembic.autogenerate.render._repr_type. As this is currently a private function, the recipe would be fairly brittle unless _repr_type were "public" and changes to its signature would be called out as a minor/major breaking change?

zzzeek commented 10 months ago

this is the thing with recipes. Think of real world use. You have an app, and you're using like two or three database-specific types, you know you're using postgresql.

Realistically, I dont really think having the TypeDecorator render is really any kind of issue, I dont really feel "maintenance" or "bugs" are impacted very much at all by this rendering and certainly not "performance", database migrations can certainly afford an extra .3 seconds startup to import your application's utility module, but as stated, there's the render_item() recipe and you can do whatever you want.

so re: render_item's "automatic logic with imports", again, real world - your app is hardcoded to Postgresql, so OK, add "from sqlalchemy.dialects import postgresql" to your script.py.mako template. that's how 99% of people would solve this, and it's not a problem. but you want to be more fancy, you can do that also, the imports collection is passed to the hook via the autogen_context, see the next paragraph at https://alembic.sqlalchemy.org/en/latest/autogenerate.html#affecting-the-rendering-of-types-themselves :

def render_item(type_, obj, autogen_context):
    """Apply custom rendering for selected items."""

    if type_ == 'type' and isinstance(obj, MySpecialType):
        # add import for this type
        autogen_context.imports.add("from mymodel import types")
        return "types.%r" % obj

    # default rendering for other objects
    return False

I think overall the vibe I want to impart is that this is a simple thing Alembic does and it allows a hook to customize, we want to keep things simple, not overthink, not qualify all of Alembic's API with N^18 flags and switches, just imperative code where needed.

lachaib commented 1 month ago

Hello, Bringing my 2 cents here, it's really bothersome for many developers, generally not proficient with alembic, even with recipes (which TBH are most of the time not well followed), to have lots of extra scripts they won't understand and be able to maintain. Besides, some developers may use type decorators without being aware of it (for example using some from sqlalchemy-utils)

I really think there is a possibility to ease there life even to the slightest.

For instance, I took the original test case, and instead of expecting the TypeDecorator's impl to the rendering, which I understand may be complex when some type decorators may be dialect dependent, I changed the expectation to add the import automatically. It took just one line to add:

    else:
        prefix = _user_autogenerate_prefix(autogen_context, type_)
+        autogen_context.imports.add(prefix.rstrip("."))
        return "%s%r" % (prefix, type_)

This single line of code may lighten 100s of repositories which would have to copy your above recipe.

The good thing is that this imports attribute is a set so having alembic pre-fill it would remain compatible with those who also import it manually, but for new projects or people reading carefully the release notes, that'll be less code in their repositories 😁

How would you feel about this trade-off?

zzzeek commented 1 month ago
    else:
        prefix = _user_autogenerate_prefix(autogen_context, type_)
+        autogen_context.imports.add(prefix.rstrip("."))
        return "%s%r" % (prefix, type_)

This single line of code may lighten 100s of repositories which would have to copy your above recipe.

The good thing is that this imports attribute is a set so having alembic pre-fill it would remain compatible with those who also import it manually, but for new projects or people reading carefully the release notes, that'll be less code in their repositories 😁

we're talking about a single line of code in their script.py.mako file or a similar line in their env.py file.

How would you feel about this trade-off?

we can't assume that a user-defined type is importable based on what it says in the __module__ attribute. the user defined type could be inside of a class definition, it could be generated on the fly from a function, and overall people usually have specific conventions they want to use for importing their own library code into source files. by hardcoding it here means it creates a bigger problem for all these cases that something "magical and wrong" is happening.

as mentioned above, we have a prominent documentation section which explains why Alembic works the way it does as well as exactly what someone needs to do when they are using non-SQLAlchemy types. SQLAlchemy and Alembic both subscribe to a philosophy of zero assumptions about a user environment and it's assumed that users will be reading documentation.