sqitchers / sqitch

Sensible database change management
https://sqitch.org
MIT License
2.78k stars 215 forks source link

MySQL deploy blocked by get_lock on another schema #670

Closed DimitriyKh closed 2 years ago

DimitriyKh commented 2 years ago

Since get_lock from

lib/App/Sqitch/Engine/mysql.pm:310: q{SELECT get_lock('sqitch working', 0)}`

Instantiates serverwide exclusive lock, there is no possibility to run simultaneous deploy on different schemas.

Also, the error message

Blocked by another instance of Sqitch working on {TARGET}; waiting 60 seconds...

appears to be misleading, and displays the target name of current deploy, not the one that holds the lock

steps to reproduce: create a target DBs, users and grants:

create schema t1_sqitch;

create schema t2;
create schema t2_sqitch;

create user t1;
grant all privileges on t1.* to t1;
grant all privileges on t1_sqitch.* to t1;

create user t2;
grant all privileges on t2.* to t2;
grant all privileges on t2_sqitch.* to t2;

create two sqtich projects, init sqitch, add targets and add a target per project so sqitch.config looks as follows:

        engine = mysql
        plan_file = sqitch.plan
        # top_dir = .
# [engine "mysql"]
        # target = db:mysql:
        # registry = sqitch
        # client = mysql
[target "t1"]
        uri = db:mysql://t1@127.0.0.1:3306/t1
        registry = t1_sqitch
        engine = mysql
        plan_file = sqitch.plan
        # top_dir = .
# [engine "mysql"]
        # target = db:mysql:
        # registry = sqitch
        # client = mysql
[target "t2"]
        uri = db:mysql://t2@127.0.0.1:3306/t2
        registry = t2_sqitch

add sample migration with sqitch add for each project

BEGIN;
select sleep(90);
COMMIT;

run sqitch in two terminal windows in parallel project1 sqitch deploy -t t1 project2 sqitch deploy -t t2

while the first run works, the second one returns an error:

Blocked by another instance of Sqitch working on t2; waiting 60 seconds...
Timed out waiting 60 seconds for another instance of Sqitch to finish work on t2
DimitriyKh commented 2 years ago

I would suggest using the project/target name in get_lock somehow, or maybe there is a way to disable get_lock at all

DimitriyKh commented 2 years ago

something like this would also work for me

diff --git a/lib/App/Sqitch/Engine/mysql.pm b/lib/App/Sqitch/Engine/mysql.pm
--- a/lib/App/Sqitch/Engine/mysql.pm    (revision caaa1ef5ee0260cc02bf5a0ba81a8475f5d30749)
+++ b/lib/App/Sqitch/Engine/mysql.pm    (date 1660561471871)
@@ -317,8 +317,9 @@
     my $self = shift;
     # Can't create a lock in the registry if it doesn't exist.
     $self->initialize unless $self->initialized;
+    my $lock_name = $self->uri->dbname;
     $self->dbh->selectcol_arrayref(
-        q{SELECT get_lock('sqitch working', 0)}
+        qq{SELECT get_lock('sqitch working on $lock_name', 0)}
     )->[0]
 }
theory commented 2 years ago

It should probably lock on the registry name.

theory commented 2 years ago

On second thought I'm a bit stumped on what to do here, since MySQL allows one to make changes to any database without reconnecting. They're effectively schemas, not databases. This is why the lock name today is global to the server: to prevent any conflicting runs at all.

I suspect very few people connect to one database and make changes to another, but it's only a suspicion, and I certainly have written Postgres Sqitch projects that update multipole schemas. And the lock there for the whole database ensures only one instance can run at a time that's updating that database. This is identical to the behavior or the MySQL engine, except that there are no completely independent namespaces in a single server.

So I'm not entirely sure what to do here. The options, as I see them, are:

  1. Leave it as-is, and people need to just deploy to one server instance at a time, regardless of what database they connect to or what registry they use. This is the safest approach.
  2. Use a lock name that includes the database name. This will not prevent concurrent runs that connect to a different databases but potentially make changes to a common database (or registry), which could cause conflicts.
  3. Use a lock name that includes the registry name. This will not prevent concurrent runs that connect to different databases but make changes to a common database, but it seems, perhaps, a bit less likely, since they will have to update different registries. But it could be annoying if one uses the same registry to manage unrelated databases/projects.
  4. Use a lock that includes both the connecting database name and the registry database name. This is a little tighter, but only slightly better than locking the registry, since one can still have two jobs that change a common database, but one can also use the same registry for different databases.
  5. Leave it as-is but provide some other method to set the lock name, perhaps an environment variable? Seems a little wonky and overly special-casing for MySQL.
DimitriyKh commented 2 years ago

Database is an alias to schema in mysql, the current approach blocks not only the database, but the whole server from not concurrent but parallel deploys. So the lock has to be either removed or renamed. We have tens of clients on single "server", deploying them one-by-one is a total time waste. As for locking with registry name, I see no reason for it as you already lock the whole registry itself in Engine/mysql.pm

# Override to lock the Sqitch tables. This ensures that only one instance of
# Sqitch runs at one time.
sub begin_work {
    my $self = shift;
    my $dbh = $self->dbh;

    # Start transaction and lock all tables to disallow concurrent changes.
    $dbh->do('LOCK TABLES ' . join ', ', map {
        "$_ WRITE"
    } qw(releases changes dependencies events projects tags));
    $dbh->begin_work;
    return $self;
theory commented 2 years ago

Okay, option 2 has the least problematic trade-offs, and is probably the most intuitive solution: one probably would expect the lock to be scoped just to the one database you connect to.

theory commented 2 years ago

Proposed fix in #677.