upgrade: seamless database upgrade from 2.x to 3.x

Totktonada commented 1 month ago

Context

A typical upgrade scenario from 2.x to 3.x is really difficult in various ways.

A typical user uses the cartridge framework, but it refuses[^1] to work on tarantool 3.x (https://github.com/tarantool/cartridge/pull/2130). OK, the application code needs some updates.

[^1]: I guess it is a kind of political decision, because I don't know valuable technical reasons to do so.

Cartridge's configuration is to be adopted to the new declarative configuration format (https://github.com/tarantool/tarantool/issues/8724)
Application roles are to be adopted to the new format (https://github.com/tarantool/doc/issues/4280)
cartridge.rpc and cartridge.connpool usages are to be adopted to the experimental.connpool module (https://github.com/tarantool/doc/issues/4129)

At the same time new versions of modules like vshard, crud, expirationd are brought, because the old ones don't offer tarantool 3.x roles. tarantoolctl is not shipped anymore, so tt is used. etcd v2 (supported by cartridge) is replaced with etcd v3 (supported by centralized configuration in tarantool EE 3.x). The new failover coordinator may be used instead of cartridge's failover. If a custom router/balancer/other service works together with the tarantool cluster, it is to be adopted.

I mean, a lot of code across the project is changed.

OK, all the code is updated and works with tarantool 3.x when a fresh database is created.

It is time to setup a staging and practice in upgrading from tarantool 2.x snapshots. We performed a hard work and this last step is simple, isn't it? Nope.

Let's assume that a usual 'zero downtime' upgrading procedure is performed:

2.x replica is stopped, 3.x replica is started on its snapshots.
It is repeated for all the replicas. Master is kept on 2.x.
The 2.x master is switched to RO, one of 3.x replicas switched to RW.
The last 2.x replica (the old master) is stopped and started as 3.x replica.

This way a replicaset is accessible for read requests all the time and accessible for write requests almost all the time -- except the master switch step.

Expectations

A new functionality offered by tarantool sometimes needs updating a database schema: some new tuple in a system space or a new system space, for example. This functionality can't work on the old schema. However, all the old mechanisms should continue to work. A quote from our release policy:

Backward compatibility (binary data layout)

A newer release (its runtime) is backward compatible in this sense with an older one when the newer release is operational when working on top of data (*.xlog, *.snap, *.vylog, *.run) from the older release. All functionality that is part of the older release is working in this configuration.

An attempt to use a new functionality either successful or give a meaningful error until the database schema upgrade: it does not lead to a service outage or data corruption. An instance is able to upgrade the data layout using the box.schema.upgrade() call to enable all features of the new release (when all instances of the replicaset are run on the same tarantool version).

<...>

Backward compatibility (replication protocol)

A instance that is run on a newer release may work as upstream (master) of an instance with an older release or as downstream (replica) without database schema upgrade.

The database schema upgrade (box.schema.upgrade()) must be performed when all replicaset instances run on the same tarantool version. The upgrade does not cause downtime (if the application does not lean on internal schema representation).

Now, the problems appears.

The problems

As said above, an application may be significantly changed during adoption to 3.x: some new spaces and Lua functions appears due to module update or application's logic update, new HTTP and iproto endpoints are added for metrics and balancers and so on.

The configuration now contains permissions for exposed application's functions and spaces. Modules/roles (crud/vshard) are waiting for RW status to finish initialization and expose needed functions. The failover coordinator waits for failover.execute function to be accessible to start to manage the cluster.

But no new functions can be registered, no new permissions can be granted, because these oprations are considered as DDL and because DDL is forbidden on a non-upgraded instance: #7149. Switching master to 3.x fails in various ways in tarantool's code (#9849), in module's code (https://github.com/tarantool/crud-ee/issues/16), in application's code.

In fact, #7149 violates the compatibility rules quoted above.

Analysis

Let's look on the database schema upgrades that are the reason of these problems.

https://github.com/tarantool/tarantool/blob/6b484622259c01a2468b1f248dd6f1bcdc227021/src/box/lua/upgrade.lua#L1368-L1400

A tuple is _schema that holds a replicaset UUID is named in a more nice way: replicaset_uuid instead of cluster.
A third column in the _cluster system space now has a nice looking name name in the space format.

https://github.com/tarantool/tarantool/blob/6b484622259c01a2468b1f248dd6f1bcdc227021/src/box/lua/upgrade.lua#L1402-L1421

Tuples in the _func system space now has {} in 20th column.
The _func system space and the _vfunc system view now has a nice looking name trigger in the space format.

That all are just cosmetic!

And it produces a lot of real-world problems.

Possible solutions

I'm not deeply in context of the features for which the upgrades were written, so I'll try to follow a common sense.

In any solution we should tune the runtime to understand the old database schema:

replicaset_name and cluster key in _schema as equivalent. A write is performed with the old name if the schema version is old.
The new column names from the space formats are not used internally, a field ID is used instead.
The value for the 20th column in _func is assumed as {} if it is not exists.

Possible solutions:

Drop the upgrades.
Mark the upgrades in a way that don't forbid DDL (some kind of 'minor upgrade').
Forbid only DDL that is related to the remaining upgrades.

instance/replicaset/group names

The new persistent names (#5029) is a separate problem. We can't write them to the database before the upgrade. However, a module/an application code may expect that 3.x reports them from box.info.name and box.info.replicaset.name (see https://github.com/tarantool/crud-ee/issues/16 for example). I guess that a new column in the _cluster system space (one that holds the replica names) doesn't break 2.x. Can we just write the names before the upgrade?

yngvar-antonsson commented 1 month ago

It wasn't a political decision to forbid using cartridge with 3.x. Cartridge operates Tarantool configuration a lot (mostly the bootstrap). To adapt Cartridge to the new way to configure instances and replicasets, we (as a team of one-man army) need to spend hundreds of hours. I tried to adapt Cartridge at least to a new bootstrap_strategy, but I failed. Maybe I'm not the person to properly perform such changes, but at least I tried. On the other hand, I worked on upgrading from 2.x to 3.x, and it was quite painful. It took about 20 hours for me, a so-called Cartridge expert, to understand how to upgrade Tarantool properly. And I'm wondering how painful it would be for our clients. Right now I would suggest creating a new cluster for anyone who uses Cartridge and somehow copying the data from the previous version. IMHO, since we're promising to our users that 3.x is compatible with the previous versions, we should provide at least a tool to easily upgrade the Tarantool cluster to a new version.

sergepetrenko commented 1 month ago

In theory we could track which exact spaces are changed during a schema upgrade and only forbid DDL touching these spaces.

For example, spaces _user and _priv aren't changed between versions 2.11 and 3.1, so the user may continue writing to them even with an old schema (2.11).

But my suggestion wouldn't fix the issue you mention completely: space _func is changed, so it would still be forbidden to register new functions there before the upgrade.

sergepetrenko commented 1 month ago

@Totktonada, have I understood the problem correctly?

config module only works with new module versions
new module versions expose new persistent functions and grants for them
DDL changes can be done only after a box.schema.upgrade()
All the new modules do not work unless box.schema.upgrade() is issued, and this creates downtime from the moment the first node is upgraded to the moment when the last one is.

If yes, is it possible to switch to new module versions while still using Tarantool 2.x? This way all the necessary DDL would be complete by the moment 3.x upgrade takes place, no?

Totktonada commented 1 month ago

In theory we could track which exact spaces are changed during a schema upgrade and only forbid DDL touching these spaces.

For example, spaces _user and _priv aren't changed between versions 2.11 and 3.1, so the user may continue writing to them even with an old schema (2.11).

It seems logical for me.

But my suggestion wouldn't fix the issue you mention completely: space _func is changed, so it would still be forbidden to register new functions there before the upgrade.

The runtime can write tuples to this space using the old format (in the general sense of this word) if old runtime versions do not understand tuples in the new format. If the tuple format is changed in a backward-compatible way (for example, a new column in added or a new field in a map is added), the runtime can just always write the tuples in the new format.

This way there is no reason to forbid writting to _func.

@Totktonada, have I understood the problem correctly?

<...>

TBH, I haven't meet this particular problem, however, it is possible.

If yes, is it possible to switch to new module versions while still using Tarantool 2.x? This way all the necessary DDL would be complete by the moment 3.x upgrade takes place, no?

It seems, it is true. However, I would note that upgrading both 2.x and 3.x application variants to new modules and performing one-more-phase upgrade still complicates the upgrading procedure.

The problems I remember:

Complicated configured names persisting procedure: #9255.
Assignment of configured privileges/roles and configured passwords from config aren't applied. The main problem now is #9849: when the applying will be postponed, it becomes simpler. Updating to 3.x is now only possible without the credentials section in config (if a correct work of a non-upgraded instance is assumed). This way the 3.x variant of the application can't expose any functions that were not exposed by the 2.x application.
- @yngvar-antonsson meet it on a demonstrational application: https://github.com/tarantool/vshard/issues/476
- @Satbek meet in with tarantooldb in several places (an attempt to set a password from the config, an attempt to register new functions with persistent body).
- I can't grant failover.execute before the upgrade to enable the failover coordinator (should be solved by #10310).
An attempt to use box.info.replication.name before an upgrade (https://github.com/tarantool/crud-ee/issues/16). May be solved by #10308 from our side.

Serpentian commented 2 weeks ago

Conclusion

Since the issue is assigned to the core team, I transform it to epic. Here're the tickets, we decided to do in the scope of the seamless upgrade from 2.x to 3.x:

tarantool / tarantool