Open Gerold103 opened 2 years ago
I think that it's possible to implement this method with at least one of the following two variants (if it's hard to fix #173):
During resharding, make callro
behave similar to callrw
, that is, go to the master, block bucket movements, etc. (however, the masters at this point will be loaded more than usual)
During resharding, make a request to the master to temporarily block the movement of buckets, then make a request to the replicas for data (it seems very slow due to network trips, but if there is a lot of data, then the load will be distributed among the replicas)
Abort, that is actually quite complicated even after #173. Because, just like map_callrw()
, it still requires all buckets to be ACTIVE
everywhere. This fact in turn means, that either map_callro()
should in each replicaset talk to 2 nodes - master (to prevent bucket moves) and replica (to execute "map"), or vshard.storage.bucket_send()
must do it. Both options are bad and can't do it in bulks - it would be either too slow or too unfair in "map vs move" contention, need to spend some time on design.
Another idea would be to talk to replicas only. And if one of replicas responds that the buckets are in move, then the router can transparently go the masters too to contend with bucket moves. That would make it able to work with just replicas when there is no rebalancing.
if one of replicas responds that the buckets are in move, then the router can transparently go the masters too to contend with bucket moves
In a case of the async replication it is possible that only the master instance knows about bucket movement. Replicas would receive this status later due to async replication lag. Looks like this approach doesn't work.
In #261 is described what is consistent map-reduce. It is also implemented for RW requests in
map_callrw
function. However the ro-version wasn't done because it can't work reliably until #173 is fixed. Once the latter is repaired, the ro-version can be done relatively easy I think.