Closed sergos closed 1 year ago
We have two possible implementations of this "feature":
1. Automatic space creation
If we want all spaces to be created without user intervention we can add a new option migration_notify
to vshard.cfg
, which will show whether vshard have to create the system spaces and write to them the first operation of every rebalancer related transaction, which can be used to distinguish changes made by rebalancer from the user's ones.
The problem here is that during vshard.storage.cfg
we don't know what kind of engine is used: we can either invoke find_sharded_space
and find out which engines we need or get this information from the user. However, spaces may not be created by the time vhard.cfg
is executed, so user defined engines seems to be a preferable solution.
2. The user can create spaces on his own
There is an alternative less intrusive and more universal solution. Lets introduce triggers which would be invoked inside the transaction which is going to gc or send bucket data.
The users would be able to create their own spaces, insert/update/replace them, and do whatever they want otherwise.
For vshard the win is that 1) we won't do this dirty hack with "special spaces" in the master branch, 2) the triggers might be used for various useful things.
As "various useful things" I mean:
For start we would have to introduce triggers for "bucket GC" and "bucket send". Need to design how to expose them.
One way is have one trigger per event. For example, vshard.storage.on_bucket_gc_txn(...)
which would install/remove a callback called first in the bucket GC txns. And vshard.storage.on_bucket_send_txn(...)
which controls triggers called first in the bucket data send txns.
Another approach is a single entry point: vshard.storage.on_bucket_event(...)
. These triggers would be called for all events. With the event name passed as a first argument. For example:
vshard.storage.on_bucket_event(function(event, ...)
if event == 'bucket_data_gc_txn' then
-- Handle it.
elseif event == 'bucket_data_send_txn' then
-- Handle it.
elseif ...
end
end)
Personally, I like the second way more. It is easier to extend. We will have to add in the future the events like "bucket_data_recv", "bucket_state_change", etc. Having a separate trigger endpoint for each of them would be a nightmare.
Both solutions need pass into the triggers the affected bucket id and space id, at least.
Applied to the current ticket - users would have to create their own "migration notify spaces", subscribe on the bucket events, and on the needed events they write into those migration spaces whatever they want.
Lets introduce triggers
it might also solve this issue (bucket generation counter)
Looks separate to me. We don't need generations for this particular ticket. Generations would cause schema change - we need to update _bucket
format for it.
Buckets migration looks like a regular client data activity, indistinguishable in the replication flow. Although the solution should be brought at the different level, we can provide a temporary solution to enable number of clients eager to have efficient data change capturing. Since all rebalancing activities are done in transactions, we can add a fake operation in the beginning of the transaction, so that client will only have to decode this first operation to skip all following ops in the transaction without decoding. The operation can be a regular DML into a specific system space. There's only one limitation - the space should be memtx or vinyl depending on the transaction, since Tarantool doesn't support multi-engine transactions. The mark operation itself is an option and can be omit in case the specific system space is not present.
In
bucket_recv_xc
and ingc_bucket_in_space_xc
the call tobox.begin()
should be followed by an operation in a system space, named_vhsard_rebalancing_{memtx|vinyl}
depending on the engine, where current bucket is.