tarantool / migrations

BSD 2-Clause "Simplified" License
21 stars 5 forks source link

Add is_healthy() check into up(). Now migrations will fail if patch_clusterwide is running #54

Open vrogach2020 opened 2 years ago

vrogach2020 commented 2 years ago

If clusterwide config has not applied yet after cluster start migration will fail. This is a common scenario in tests.

See

code: 32
message: AtomicCallError: cartridge.patch_clusterwide is already running
stack traceback:
    /app/.rocks/share/tarantool/cartridge/twophase.lua:583: in function 'config_patch_clusterwide'
    /app/.rocks/share/tarantool/migrator.lua:105: in function 'up'
    eval:1: in main chunk
    [C]: at 0x006163c0

Maybe we can introduce is_healthy() check or a timeout into up() and wait until config is applied ?

Now i need to use such workaraound:

            if (!container.isRunning()) {
                container.start();
            }

            boolean healthy = false;
            int attempts = 30;
            while(!healthy && attempts-- >0) {
                List<?> result = container.executeCommand("return require('cartridge').is_healthy()").get();
                log.info("Checking cluster healthy status: {}, {}", result.size(), result.toString());
                if(result.size()==1) {
                    healthy = (Boolean)result.get(0);
                }
                Thread.sleep(1000);
            }

            if(!healthy) {
                throw new RuntimeException("Failed to get cluster in healthy state");
            }
            container.executeCommand("require('migrator').up()").get();
Totktonada commented 2 years ago

Waiting until clusterwide config will be applied or until a startup_timeout will reached in up() looks meaningful for me. This startup_timeout should be applied only for waiting cluster bootstrapping: not to the migration itself. (The name of the option is to be discussed. Maybe bootstrap_timeout.)

We should also add the startup_timeout (or how we'll name it) query parameter into the /migrations/up HTTP endpoint.

I'll include it into our backlog without a deadline. Reach me if you want to raise a priority here.