Rebalancing starts and takes rw-lock on some individual buckets. Then it sleeps waiting for map_callrw() to end;
New rw-requests to the locked buckets fail.
The problem is that the rebalancer really competes only with map requests. It shouldn't be a problem to allow new per-bucket rw requests while map_callrw() is running. They are not supposed to be long, so they won't starve the rebalancer.
Moreover, it looks really strange that map_callrw promises you that the whole cluster is writeable and yet you can't make any callrw requests even to individual buckets.
The correct algorithm would be:
bucket_send_xc() firstly makes simple sanity checks;
Then it makeslsched.move_start();
Then it rw-locks the bucket and waits for it to have zero rw refs;
The problem is the following:
map_callrw()
starts;map_callrw()
to end;The problem is that the rebalancer really competes only with map requests. It shouldn't be a problem to allow new per-bucket rw requests while
map_callrw()
is running. They are not supposed to be long, so they won't starve the rebalancer.Moreover, it looks really strange that
map_callrw
promises you that the whole cluster is writeable and yet you can't make anycallrw
requests even to individual buckets.The correct algorithm would be:
bucket_send_xc()
firstly makes simple sanity checks;lsched.move_start()
;Reported by @R-omk.