tarantool / vshard

The new generation of sharding based on virtual buckets
Other
100 stars 30 forks source link

Mark the rebalancing transactions #372

Closed sergos closed 1 year ago

sergos commented 1 year ago

Buckets migration looks like a regular client data activity, indistinguishable in the replication flow. Although the solution should be brought at the different level, we can provide a temporary solution to enable number of clients eager to have efficient data change capturing. Since all rebalancing activities are done in transactions, we can add a fake operation in the beginning of the transaction, so that client will only have to decode this first operation to skip all following ops in the transaction without decoding. The operation can be a regular DML into a specific system space. There's only one limitation - the space should be memtx or vinyl depending on the transaction, since Tarantool doesn't support multi-engine transactions. The mark operation itself is an option and can be omit in case the specific system space is not present.

In bucket_recv_xc and in gc_bucket_in_space_xc the call to box.begin() should be followed by an operation in a system space, named _vhsard_rebalancing_{memtx|vinyl} depending on the engine, where current bucket is.

Serpentian commented 1 year ago

We have two possible implementations of this "feature":

1. Automatic space creation

If we want all spaces to be created without user intervention we can add a new option migration_notify to vshard.cfg, which will show whether vshard have to create the system spaces and write to them the first operation of every rebalancer related transaction, which can be used to distinguish changes made by rebalancer from the user's ones.

The problem here is that during vshard.storage.cfg we don't know what kind of engine is used: we can either invoke find_sharded_space and find out which engines we need or get this information from the user. However, spaces may not be created by the time vhard.cfg is executed, so user defined engines seems to be a preferable solution.

2. The user can create spaces on his own

Gerold103 commented 1 year ago

There is an alternative less intrusive and more universal solution. Lets introduce triggers which would be invoked inside the transaction which is going to gc or send bucket data.

The users would be able to create their own spaces, insert/update/replace them, and do whatever they want otherwise.

For vshard the win is that 1) we won't do this dirty hack with "special spaces" in the master branch, 2) the triggers might be used for various useful things.

As "various useful things" I mean:

For start we would have to introduce triggers for "bucket GC" and "bucket send". Need to design how to expose them.

One way is have one trigger per event. For example, vshard.storage.on_bucket_gc_txn(...) which would install/remove a callback called first in the bucket GC txns. And vshard.storage.on_bucket_send_txn(...) which controls triggers called first in the bucket data send txns.

Another approach is a single entry point: vshard.storage.on_bucket_event(...). These triggers would be called for all events. With the event name passed as a first argument. For example:

vshard.storage.on_bucket_event(function(event, ...)
    if event == 'bucket_data_gc_txn' then
        -- Handle it.
    elseif event == 'bucket_data_send_txn' then
        -- Handle it.
    elseif ...
    end
end)

Personally, I like the second way more. It is easier to extend. We will have to add in the future the events like "bucket_data_recv", "bucket_state_change", etc. Having a separate trigger endpoint for each of them would be a nightmare.

Both solutions need pass into the triggers the affected bucket id and space id, at least.

Applied to the current ticket - users would have to create their own "migration notify spaces", subscribe on the bucket events, and on the needed events they write into those migration spaces whatever they want.

R-omk commented 1 year ago

Lets introduce triggers

it might also solve this issue (bucket generation counter)

Gerold103 commented 1 year ago

Looks separate to me. We don't need generations for this particular ticket. Generations would cause schema change - we need to update _bucket format for it.