googleapis / python-spanner-sqlalchemy

Apache License 2.0
39 stars 28 forks source link

Alembic generated migrations support #229

Closed IlyaFaer closed 2 years ago

IlyaFaer commented 2 years ago

Check how limitations are handled by generated migrations:

IlyaFaer commented 2 years ago

@asthamohta, @ansh0l

IlyaFaer commented 2 years ago

Problem with primary keys in alembic_version table should not be a problem.

Alembic only generates migration files, while env.py file is written manually by developers. In this env.py file a parameter version_table_pk should be used to prevent Alembic from using primary keys in alembic_version table, see the docs for details: https://github.com/googleapis/python-spanner-sqlalchemy#migration

ansh0l commented 2 years ago

@louimet Would you be able able to take a look at this issue and let us know the error you are facing?

louimet commented 2 years ago

Working on v3.19.0, I used the following declarative class to generate two migrations. The first creates the 'test' table with only the 'id' column, the second adds the 'name column.

engine = create_engine(settings.SPANNER_URL)
metadata = MetaData(bind=engine)
Base = declarative_base(metadata=metadata)

class Test(Base):
    __tablename__ = 'test'

    id = Column(String(3), primary_key=True)
    name = Column(String(3))

Running the first migration works fine. But running the second migration gives :

root ➜ /workspaces/hydra-monorepo/migrations (google_sqlalchemy ✗) $ ./alembic-google upgrade head
INFO  [alembic.runtime.migration] Context impl SpannerImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 16cbbcf75d48 -> b1c6f829e41b, add column
Traceback (most recent call last):
  File "/root/.local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 140, in error_remapped_callable
    return _StreamingResponseIterator(
  File "/root/.local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 66, in __init__
    self._stored_first_result = next(self._wrapped)
  File "/root/.local/lib/python3.9/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "/root/.local/lib/python3.9/site-packages/grpc/_channel.py", line 826, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.ABORTED
        details = "Transaction aborted. Database schema probably changed during transaction, retry may succeed."
        debug_error_string = "{"created":"@1661283549.074681776","description":"Error received from peer ipv4:172.217.13.170:443","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Transaction aborted. Database schema probably changed during transaction, retry may succeed.","grpc_status":10}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/cursor.py", line 269, in execute
    self._itr = PeekIterator(self._result_set)
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/utils.py", line 38, in __init__
    head = next(itr_src)
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_v1/streamed.py", line 145, in __iter__
    self._consume_next()
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_v1/streamed.py", line 117, in _consume_next
    response = next(self._response_iterator)
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_v1/snapshot.py", line 59, in _restart_on_unavailable
    iterator = method(request=request)
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_v1/services/spanner/client.py", line 1138, in execute_streaming_sql
    response = rpc(
  File "/root/.local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 154, in __call__
    return wrapped_func(*args, **kwargs)
  File "/root/.local/lib/python3.9/site-packages/google/api_core/timeout.py", line 99, in func_with_timeout
    return func(*args, **kwargs)
  File "/root/.local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 144, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.Aborted: 409 Transaction aborted. Database schema probably changed during transaction, retry may succeed. [retry_delay {
  nanos: 10589511
}
]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/root/.local/lib/python3.9/site-packages/google/cloud/sqlalchemy_spanner/sqlalchemy_spanner.py", line 1006, in do_execute
    cursor.execute(statement, parameters)
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/cursor.py", line 70, in wrapper
    return function(cursor, *args, **kwargs)
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/cursor.py", line 272, in execute
    self.connection.retry_transaction()
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/connection.py", line 263, in retry_transaction
    self._rerun_previous_statements()
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/connection.py", line 298, in _rerun_previous_statements
    _compare_checksums(statement.checksum, retried_checksum)
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/checksum.py", line 78, in _compare_checksums
    raise RetryAborted(
google.cloud.spanner_dbapi.exceptions.RetryAborted: The transaction was aborted and could not be retried due to a concurrent modification.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/.local/bin/alembic", line 8, in <module>
    sys.exit(main())
  File "/root/.local/lib/python3.9/site-packages/alembic/config.py", line 590, in main
    CommandLine(prog=prog).main(argv=argv)
  File "/root/.local/lib/python3.9/site-packages/alembic/config.py", line 584, in main
    self.run_cmd(cfg, options)
  File "/root/.local/lib/python3.9/site-packages/alembic/config.py", line 561, in run_cmd
    fn(
  File "/root/.local/lib/python3.9/site-packages/alembic/command.py", line 322, in upgrade
    script.run_env()
  File "/root/.local/lib/python3.9/site-packages/alembic/script/base.py", line 569, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/root/.local/lib/python3.9/site-packages/alembic/util/pyfiles.py", line 94, in load_python_file
    module = load_module_py(module_id, path)
  File "/root/.local/lib/python3.9/site-packages/alembic/util/pyfiles.py", line 110, in load_module_py
    spec.loader.exec_module(module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "google_error_reporting/env.py", line 83, in <module>
    run_migrations_online()
  File "google_error_reporting/env.py", line 77, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/root/.local/lib/python3.9/site-packages/alembic/runtime/environment.py", line 853, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/root/.local/lib/python3.9/site-packages/alembic/runtime/migration.py", line 630, in run_migrations
    head_maintainer.update_to_step(step)
  File "/root/.local/lib/python3.9/site-packages/alembic/runtime/migration.py", line 854, in update_to_step
    self._update_version(from_, to_)
  File "/root/.local/lib/python3.9/site-packages/alembic/runtime/migration.py", line 791, in _update_version
    ret = self.context.impl._exec(
  File "/root/.local/lib/python3.9/site-packages/alembic/ddl/impl.py", line 195, in _exec
    return conn.execute(construct, multiparams)
  File "/root/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1380, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "/root/.local/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 333, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/root/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
    ret = self._execute_context(
  File "/root/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
    self._handle_dbapi_exception(
  File "/root/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "/root/.local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/root/.local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/root/.local/lib/python3.9/site-packages/google/cloud/sqlalchemy_spanner/sqlalchemy_spanner.py", line 1006, in do_execute
    cursor.execute(statement, parameters)
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/cursor.py", line 70, in wrapper
    return function(cursor, *args, **kwargs)
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/cursor.py", line 272, in execute
    self.connection.retry_transaction()
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/connection.py", line 263, in retry_transaction
    self._rerun_previous_statements()
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/connection.py", line 298, in _rerun_previous_statements
    _compare_checksums(statement.checksum, retried_checksum)
  File "/root/.local/lib/python3.9/site-packages/google/cloud/spanner_dbapi/checksum.py", line 78, in _compare_checksums
    raise RetryAborted(
sqlalchemy.exc.OperationalError: (google.cloud.spanner_dbapi.exceptions.RetryAborted) The transaction was aborted and could not be retried due to a concurrent modification.
[SQL: UPDATE alembic_version SET version_num='b1c6f829e41b' WHERE alembic_version.version_num = '16cbbcf75d48']
(Background on this error at: https://sqlalche.me/e/14/e3q8)

Note that examining the equivalent DDL for the alembic_version table in the database gives this:

image

So it would seem that the version_num column is tagged as the primary key even though we specify version_table_pk=False in our env.py as prescribed.

IlyaFaer commented 2 years ago

@louimet, hm-m, let's think...

As far as I understand, you've tried to do a first migration, it failed. Than you've added this version_table_pk=False argument into evn.py and tried again, right?

I suppose, in this case it'll not work, because alembic_versions table is already created. According to the docs, the argument version_table_pk only used on the alembic_version table creation (during the very first migration of your service). Thus, I suspect, to make it work now, you should drop the table to force Alembic to create it again, but without this PK constraint.

louimet commented 2 years ago

@IlyaFaer No, version_table_pk was False from the start.

I thought that might be it in our actual database, that we might have added that subsequently. That's why I started from scratch, creating a new database, to make sure that I had that setting with the correct value from the start. And I still get that error.

I just checked again :

The error doesn't obviously point to a problem with the primary key in the alembic_version table, rather mentionning a "concurrent modification". It could be a complete red herring, but looking at the details, I'm a bit confused.

It's not obvious that the shown "equivalent DDL" is the actual DDL that was used to create the table. It mentions PRIMARY KEY, but it's not clear whether the table has one or not, as shown by the following comparison.

image

Also, using the information mentioned here, I looked up the details of the database, and it looks like the alembic_version table does indeed have a primary key constraint, but no column attached to it.

image

Again, maybe the PK is not the culprit here, but then I don't see what could be the "concurrent modification" that's making the transaction fail. Also note that running both migrations in a fresh new database in a single 'alembic upgrade head' works perfectly fine. That really points to some problem during the update in the alembic_version table.

Here's my complete env.py file.

from logging.config import fileConfig

from alembic import context
from alembic.ddl.impl import DefaultImpl

from shared.config import settings
from shared.db import models

class SpannerImpl(DefaultImpl):
    """Makes dialect discoverable by alembic"""

    __dialect__ = 'spanner+spanner'

# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config

# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
    fileConfig(config.config_file_name)

# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = models.Base.metadata

# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.

def run_migrations_offline():
    """Run migrations in 'offline' mode.

    This configures the context with just a URL
    and not an Engine, though an Engine is acceptable
    here as well.  By skipping the Engine creation
    we don't even need a DBAPI to be available.

    Calls to context.execute() here emit the given string to the
    script output.

    """
    url = settings.SPANNER_URL
    context.configure(
        url=url,
        target_metadata=target_metadata,
        literal_binds=True,
        dialect_opts={'paramstyle': 'named'},
    )

    with context.begin_transaction():
        context.run_migrations()

def run_migrations_online():
    """Run migrations in 'online' mode.

    In this scenario we need to create an Engine
    and associate a connection with the context.

    """
    connectable = target_metadata.bind

    with connectable.connect() as connection:
        context.configure(
            connection=connection, target_metadata=target_metadata, render_as_batch=True, version_table_pk=False
        )

        with context.begin_transaction():
            context.run_migrations()

if context.is_offline_mode():
    run_migrations_offline()
else:
    run_migrations_online()
IlyaFaer commented 2 years ago

@louimet, I was able to reproduce it - thanks for the detailed description! I think I understand the problem now (you can scroll to the bottom of the post for the solution).

@ansh0l, @asthamohta, this probably should be a question for the Spanner backend team.

Logging all the operations, made by Alembic during migration, I see this:

SELECT alembic_version.version_num FROM alembic_version

INFO  [alembic.runtime.migration] Running upgrade 7f0ef95eb985 -> 6c6c3c5f73a2, second_mig

CREATE TABLE test2 (
        id STRING(3) NOT NULL
) PRIMARY KEY (id)

UPDATE alembic_version SET version_num='6c6c3c5f73a2' WHERE alembic_version.version_num = '7f0ef95eb985'

There are no concurrent updates, but UPDATE alembic_versions still fails with the error 409 Transaction aborted. Database schema probably changed during transaction, retry may succeed (I disabled auto retry mode to see the original error message, so it's a bit different)

Here should be noted that the migration is actually applied, and the table test2 is created. But because the Alembic revision wasn't updated, on the next migration Alembic will try to repeat the migration, which is already applied.

The weird moment is this: if these operations are executed separately and manually, they all are passing fine. Overthinking it a bit, I decided to try to put Spanner connections into AUTOCOMMIT mode by default, and migration passed fine:

Untitled

And here a couple of questions appear:

@louimet, looks like the solution for your case is to make Alembic work with Spanner in autocommit mode. Fortunately, Alembic already have a tool for cases like ours, see the docs. I tried it locally, seems to be solving the problem:

Untitled2

Let me know if it solves your problem. If yes, I'll update the repo docs to mention this case and the solution.

louimet commented 2 years ago

Yes, it seems to solve the problem.

def upgrade() -> None:
    with op.get_context().autocommit_block():
        op.create_table('test2', sa.Column('id', sa.String(length=3), nullable=False), sa.PrimaryKeyConstraint('id'))

Running it gives this:

INFO  [alembic.runtime.migration] Running upgrade 67449554fba0 -> a0cb48593ca3, add 2nd table, with autocommit
/root/.local/lib/python3.9/site-packages/google/cloud/sqlalchemy_spanner/sqlalchemy_spanner.py:983: UserWarning: This method is non-operational in autocommit mode
  dbapi_connection.commit()

The table is created properly, and the alembic_version table is updated with the correct version number, so everything looks fine.

However, since Alembic's autocommit is based on SQLAlchemy's autocommit, which is going to be deprecated in SqlAlchemy 2.0, I'm not sure this would be a viable long-term solution, it will at least require some tweaking (but we can use that in the meantime).

On the whole though, isn't that a hint that there's an underlying problem in the way transactions are segmented?

IlyaFaer commented 2 years ago

However, since Alembic's autocommit is based on SQLAlchemy's autocommit, which is going to be deprecated in SqlAlchemy 2.0, I'm not sure this would be a viable long-term solution, it will at least require some tweaking (but we can use that in the meantime).

Well, we're not the only database, which requires autocommit during migrations (and autocommit on the whole), so there will be solution for all of us in SQLAlchemy 2.0.

On the whole though, isn't that a hint that there's an underlying problem in the way transactions are segmented?

I'd say rules of transactions abortion are too strict for now. In this particular case it seems to be safe to run DDL and continue transaction, but it's still aborted. We brought this to responsible engineers' attention, will see what they'll say.

louimet commented 2 years ago

We brought this to responsible engineers' attention, will see what they'll say.

Cool, thanks!

zashraf1985 commented 9 months ago

This issue and this solution should be added to the main doc. I spent more than a day struggling with this problem and just when i was about to give up, i thought to take a look at closed issues and here it is. The autocommit worked like charm with a warning. Please add this to the main docs. It will save a lot of time for others.

jerrydeng commented 2 months ago

The issue happened to me as well but it was failing in alembic_version migration table creation step. It wasn't able to create migration version table due to lack of Primary key defined. Running the following directly on psql using PGAdapter failed with ERROR: Primary key must be defined for table "alembic_version".

CREATE TABLE alembic_version (
        version_num VARCHAR(32) NOT NULL
)

I understand that Spanner require primary key and primary key cannot be updated. That's why the env.py config on alembic needs to be set to

with connectable.connect() as connection:
    context.configure(
        connection=connection,
        target_metadata=target_metadata,
        version_table_pk=False,  # don't use primary key in the versions table
    )

But this just got rid of the PRIMARY syntax from the alembic_version table creation step, not assigning a new required primary key.