tarantool / vshard

The new generation of sharding based on virtual buckets
Other
98 stars 29 forks source link

Tests fail if `box.cfg` is patched to re-apply dynamic options unconditionally #411

Closed locker closed 1 year ago

locker commented 1 year ago

Historically, in Tarantool box.cfg doesn't even try to re-apply a dynamic configuration option if the option value is the same. In particular, box.cfg{replication = box.cfg.replication} and box.cfg{listen = box.cfg.listen} are no-op, no matter what the current state is. This is inconvenient, because sometimes we do need to take an action in this case so it should be up to the configuration callback to decide whether it needs to do anything when the option is set to the same value. For example, even if all URIs are the same, we might still need to update the internal box.cfg.replication or box.cfg.listen config in order to reload SSL certificate files, see https://github.com/tarantool/tarantool-ee/issues/432.

That said, we're planning to change the box.cfg behavior so that it always invokes the underlying callback, even if the option values are unchanged, see https://github.com/tarantool/tarantool/issues/8551. The replication configuration callback already tries to avoid re-connection to master instances, see commit https://github.com/tarantool/tarantool/commit/5994892c2f2c34823356efe5d8560e31e9ad7c42 so in theory everything should continue to work fine. However, this change in the box.cfg logic would still introduce a behavior change because box.cfg.replication callback still tries to connect to and authenticate in all master instances. If this fails, box.cfg may fail as well. This seems to break some vshard tests, see the integration workflow in https://github.com/tarantool/tarantool/pull/8552.

In case of vshard it seems that a test tries to call box.cfg{replication = box.cfg.replication} after a cluster bootstrap but before granting access rights to let replicas in. Since with https://github.com/tarantool/tarantool/pull/8552 applied Tarantool tries to authenticate on all instances, which aren't boostrapped yet, it hangs. See the logs below.

logs **Tarantool branch** ``` gh pr checkout 8552 ``` **Vshard test command** ``` ./test-run.py router-luatest/router_test.lua ``` **replica_1_a.log** ``` 2023-04-12 18:03:54.431 [1031472] main/103/storage.lua I> Tarantool 3.0.0-entrypoint-165-g8b4b088325b3 Linux-x86_64-Debug 2023-04-12 18:03:54.431 [1031472] main/103/storage.lua I> log level 5 (INFO) 2023-04-12 18:03:54.431 [1031472] main/103/storage.lua I> wal/engine cleanup is paused 2023-04-12 18:03:54.431 [1031472] main/103/storage.lua I> mapping 268435456 bytes for memtx tuple arena... 2023-04-12 18:03:54.431 [1031472] main/103/storage.lua I> Actual slab_alloc_factor calculated on the basis of desired slab_alloc_factor = 1.044274 2023-04-12 18:03:54.432 [1031472] main/103/storage.lua I> mapping 134217728 bytes for vinyl tuple arena... 2023-04-12 18:03:54.439 [1031472] main/103/storage.lua I> update replication_synchro_quorum = 1 2023-04-12 18:03:54.440 [1031472] main/103/storage.lua I> instance uuid 00000000-0000-0000-0000-000000000002 2023-04-12 18:03:54.440 [1031472] main/103/storage.lua I> tx_binary: bound to unix/:(socket) 2023-04-12 18:03:54.440 [1031472] main/103/storage.lua I> connecting to 2 replicas 2023-04-12 18:03:54.440 [1031472] main/111/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto I> can't connect to master 2023-04-12 18:03:54.440 [1031472] main/111/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto sio.c:209 E> SocketError: connect to unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto, called on fd 39, aka unix/:(socket): No such file or directory 2023-04-12 18:03:54.440 [1031472] main/111/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto I> will retry every 0.10 second 2023-04-12 18:03:54.441 [1031472] main/110/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto I> remote master 00000000-0000-0000-0000-000000000002 at unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto running Tarantool 3.0.0 2023-04-12 18:03:54.540 [1031472] main/111/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto I> remote master 00000000-0000-0000-0000-000000000003 at unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto running Tarantool 3.0.0 2023-04-12 18:03:54.541 [1031472] main/103/storage.lua I> connected to 2 replicas 2023-04-12 18:03:54.541 [1031472] main/103/storage.lua I> Checking if 00000000-0000-0000-0000-000000000002 at storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto chose this instance as bootstrap leader 2023-04-12 18:03:54.541 [1031472] main/103/storage.lua I> Checking if 00000000-0000-0000-0000-000000000003 at storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto chose this instance as bootstrap leader 2023-04-12 18:03:54.541 [1031472] main/103/storage.lua I> initializing an empty data directory 2023-04-12 18:03:54.583 [1031472] main/103/storage.lua I> assigned id 1 to replica 00000000-0000-0000-0000-000000000002 2023-04-12 18:03:54.583 [1031472] main/103/storage.lua I> update replication_synchro_quorum = 1 2023-04-12 18:03:54.583 [1031472] main/103/storage.lua I> cluster uuid 00000000-0000-0000-0000-000000000001 2023-04-12 18:03:54.585 [1031472] snapshot/101/main I> saving snapshot `./00000000000000000000.snap.inprogress' 2023-04-12 18:03:54.589 [1031472] snapshot/101/main I> done 2023-04-12 18:03:54.590 [1031472] main/103/storage.lua I> ready to accept requests 2023-04-12 18:03:54.590 [1031472] main/104/gc I> wal/engine cleanup is resumed 2023-04-12 18:03:54.590 [1031472] iproto sio.c:285 E> SocketError: writev(1), called on fd 41, aka unix/:(socket), peer of unix/:(socket): Broken pipe 2023-04-12 18:03:54.590 [1031472] main/103/storage.lua/box.load_cfg I> set 'replication_timeout' configuration option to 0.1 2023-04-12 18:03:54.590 [1031472] main/103/storage.lua/box.load_cfg I> set 'log' configuration option to "\/home\/vlad\/src\/tarantool\/vshard\/test\/var\/001_router-luatest\/replica_1_a-_jITjz2gGx87\/replica_1_a.log" 2023-04-12 18:03:54.590 [1031472] main/103/storage.lua/box.load_cfg I> set 'replication' configuration option to ["storage@unix\/:\/home\/vlad\/src\/tarantool\/vshard\/test\/var\/001_router-luatest\/replica_1_a.iproto","storage@unix\/:\/home\/vlad\/src\/tarantool\/vshard\/test\/var\/001_router-luatest\/replica_1_b.iproto"] 2023-04-12 18:03:54.590 [1031472] main/103/storage.lua/box.load_cfg I> set 'listen' configuration option to "\/home\/vlad\/src\/tarantool\/vshard\/test\/var\/001_router-luatest\/replica_1_a.iproto" 2023-04-12 18:03:54.590 [1031472] main/103/storage.lua/box.load_cfg I> set 'instance_uuid' configuration option to "00000000-0000-0000-0000-000000000002" 2023-04-12 18:03:54.590 [1031472] main/103/storage.lua/box.load_cfg I> set 'replicaset_uuid' configuration option to "00000000-0000-0000-0000-000000000001" 2023-04-12 18:03:54.590 [1031472] main/105/checkpoint_daemon I> scheduled next checkpoint for Wed Apr 12 19:05:13 2023 2023-04-12 18:03:54.590 [1031472] main/111/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto I> failed to authenticate 2023-04-12 18:03:54.590 [1031472] main/111/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto box.cc:3694 E> ER_LOADING: Instance bootstrap hasn't finished yet 2023-04-12 18:03:54.590 [1031472] main/111/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto I> will retry every 0.10 second 2023-04-12 18:03:54.592 [1031472] main I> entering the event loop ``` **replica_1_b.log** ``` 2023-04-12 18:03:54.433 [1031473] main/103/storage.lua I> Tarantool 3.0.0-entrypoint-165-g8b4b088325b3 Linux-x86_64-Debug 2023-04-12 18:03:54.433 [1031473] main/103/storage.lua I> log level 5 (INFO) 2023-04-12 18:03:54.433 [1031473] main/103/storage.lua I> wal/engine cleanup is paused 2023-04-12 18:03:54.433 [1031473] main/103/storage.lua I> mapping 268435456 bytes for memtx tuple arena... 2023-04-12 18:03:54.433 [1031473] main/103/storage.lua I> Actual slab_alloc_factor calculated on the basis of desired slab_alloc_factor = 1.044274 2023-04-12 18:03:54.433 [1031473] main/103/storage.lua I> mapping 134217728 bytes for vinyl tuple arena... 2023-04-12 18:03:54.441 [1031473] main/103/storage.lua I> update replication_synchro_quorum = 1 2023-04-12 18:03:54.441 [1031473] main/103/storage.lua I> instance uuid 00000000-0000-0000-0000-000000000003 2023-04-12 18:03:54.441 [1031473] main/103/storage.lua I> tx_binary: bound to unix/:(socket) 2023-04-12 18:03:54.441 [1031473] main/103/storage.lua I> connecting to 2 replicas 2023-04-12 18:03:54.441 [1031473] main/110/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto I> remote master 00000000-0000-0000-0000-000000000002 at unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto running Tarantool 3.0.0 2023-04-12 18:03:54.443 [1031473] main/111/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto I> remote master 00000000-0000-0000-0000-000000000003 at unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_b.iproto running Tarantool 3.0.0 2023-04-12 18:03:54.443 [1031473] main/103/storage.lua I> connected to 2 replicas 2023-04-12 18:03:54.443 [1031473] main/110/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto I> failed to authenticate 2023-04-12 18:03:54.443 [1031473] main/110/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto box.cc:3694 E> ER_LOADING: Instance bootstrap hasn't finished yet 2023-04-12 18:03:54.443 [1031473] main/110/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto I> will retry every 0.10 second 2023-04-12 18:03:54.679 [1031473] main/110/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto I> failed to authenticate 2023-04-12 18:03:54.679 [1031473] main/110/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto authentication.c:116 E> ER_CREDS_MISMATCH: User not found or supplied credentials are invalid 2023-04-12 18:03:54.679 [1031473] main/110/applier/storage@unix/:/home/vlad/src/tarantool/vshard/test/var/001_router-luatest/replica_1_a.iproto I> will retry every 0.10 second ```

We do need this change in order to support reloading of SSL certificate files without changing the file names.

I don't know if it's a vshard issue or a test issue, but if it's a test issue I think we should fix this. Please investigate.

Gerold103 commented 1 year ago

The reason is that your PR breaks this scenario:

--
-- Instance 1
--
-- Step 1
--
cfg = {
    listen = 3313,
    replication = {
        'storage:storage@localhost:3313',
        'storage:storage@localhost:3314',
    },
    replication_timeout = 0.1,
}
box.cfg(cfg)
box.schema.user.grant('guest', 'super', nil, nil, {if_not_exists = true})

--
-- Instance 2
--
-- Step 2
--
cfg = {
    listen = 3314,
    replication = {
        'storage:storage@localhost:3313',
        'storage:storage@localhost:3314',
    },
    replication_timeout = 0.1,
}
box.cfg(cfg)
box.schema.user.grant('guest', 'super', nil, nil, {if_not_exists = true})

Then you wait which one boots. On the booted one do this:

box.cfg(cfg)
box.schema.user.create('storage', {
    password = 'storage',
    if_not_exists = true,
})
box.schema.user.grant('storage', 'super')

Before your PR this last box.cfg(cfg) was nop. But now it causes replication sync from scratch. Syncing obviously fails, because the other storage is not booted yet. It doesn't look right that you try to sync replication in this call. Look at the logs:

tarantool> box.cfg(cfg)
2023-04-12 23:55:32.131 [13697] main/103/interactive I> tx_binary: stopped
2023-04-12 23:55:32.131522+0200 tarantool[13697:10328874] [si_destination_compare] send failed: Invalid argument
2023-04-12 23:55:32.131544+0200 tarantool[13697:10328874] [si_destination_compare] send failed: Invalid argument
2023-04-12 23:55:32.132 [13697] main/103/interactive I> tx_binary: bound to [::]:3314
2023-04-12 23:55:32.132 [13697] main/103/interactive/box.load_cfg I> set 'listen' configuration option to 3314
2023-04-12 23:55:32.132 [13697] main/103/interactive/box.load_cfg I> set 'replication_timeout' configuration option to 0.1
2023-04-12 23:55:32.132 [13697] main/103/interactive I> connecting to 2 replicas
2023-04-12 23:55:32.133 [13697] main/191/applier/storage@localhost:3314 I> remote master c50a33ac-9134-4ab8-bc65-5d5d99b600de at [::1]:3314 running Tarantool 3.0.0
2023-04-12 23:55:32.133 [13697] main/190/applier/storage@localhost:3313 I> remote master f8675a37-aec6-4ef4-9b7d-184c6d5d1ade at [::1]:3313 running Tarantool 3.0.0
2023-04-12 23:55:32.137 [13697] main/103/interactive I> connected to 2 replicas
2023-04-12 23:55:32.137 [13697] main/103/interactive I> synchronizing with 2 replicas
2023-04-12 23:55:32.137 [13697] main/190/applier/storage@localhost:3313 I> failed to authenticate
2023-04-12 23:55:32.137 [13697] main/190/applier/storage@localhost:3313 box.cc:3694 E> ER_LOADING: Instance bootstrap hasn't finished yet
2023-04-12 23:55:32.137 [13697] main/190/applier/storage@localhost:3313 I> will retry every 0.10 second

Why did you rebind? The listen argument didn't change. Replication timeout didn't change either. Nor the replication settings. And the replication was feeling fine - it wasn't stopped. It was trying to reconnect.

The case might look artificial, but it is real. The box.cfg(cfg) in the last code snippet is actually done inside vshard.storage.cfg(). Yet it might be not the first box.cfg called, because some people prefer to do it manually before vshard.storage.cfg() for whatever reasons.

locker commented 1 year ago

Why did you rebind? The listen argument didn't change. Replication timeout didn't change either. Nor the replication settings. And the replication was feeling fine - it wasn't stopped. It was trying to reconnect.

Rebind looks safe in case of vshard: disabling box.cfg.listen update when the old and the new values are the same (by updating the load_cfg.lua code) doesn't help vshard tests. OTOH if I disable replication update, vshard tests pass.

I think that if the user calls box.cfg, they expect the parameters to be re-applied. True, the underlying configuration callback might skip the update, but only the callback may decide this - comparing Lua values isn't enough. Even if the URI is the same, we might still need to take an action (in case of listen - rebind). For example, consider the case when the user updates the SSL certificate file. URI doesn't change in this case, but we do need to rebind to update the certificate (https://github.com/tarantool/tarantool-ee/issues/432). Another example is the case when replication is broken because of a conflict. After resolving the conflict, the user has to disable replication and then re-enable it instead of just calling box.cfg{replication = box.cfg.replication}, which is inconvenient and confusing.

We could introduce a box.ctl knob to restart replication or rebind, like it was suggested in https://github.com/tarantool/tarantool/issues/8551#issuecomment-1503733925, but I think we should avoid that if we can.

Can we fix the replication configuration callback to handle this case properly? We could compare the new URIs with the URIs of the currently running appliers, but IMO that would rather be ugly. Maybe, we could disable sync on box.cfg if replication is already failing to connect?

Gerold103 commented 1 year ago

Can we fix the replication configuration callback to handle this case properly?

In this particular case I don't see why reapplying it is needed at all. The replication is not dead and didn't change. Why re-create it from scratch? It won't change anything except for blocking box.cfg for sync timeout seconds.

locker commented 1 year ago

https://github.com/tarantool/tarantool/issues/8551#issuecomment-1511633900