Closed filonenko-mikhail closed 1 year ago
Actually, router
and storage
are not reloaded when we do something like this:
package.loaded['vshard'] = nil
local vshard = require('vshard')
As user expects everything to be reloaded, I suppose we should implement atomic reload of the whole vshard
.
Speaking of restoring fibers after explicit kill of them, we can do that in replicaset.rebind_replicasets
. This will restore connection when router is reloaded. The other solution is to add check if the connection's fiber is dead right here:
https://github.com/tarantool/vshard/blob/dd70cfb2c5ec36ab7d5355b0024e5f6d21bb8f9f/vshard/replicaset.lua#L173-L177
As this method is invoked in replicaset_master_call
fibers will be restored too.
Most of replicaset methods like rebind_replicasets()
are internal, people shouldn't use it in their code. A proper fix is firstly 1) make the core netbox report its worker fiber state as closed
if the fiber is cancelled. I suspect it might be reported as error_reconnect
or something, which is misleading - it is not reconnecting anymore. Or make netbox spawn a new fiber if the current one is cancelled. 2) replicaset_connect_to_replica()
can try to check if the state == error_reconnect
(or whatever the name is), then we also check the fiber state somehow (don't know if worker fiber state is reachable at all) - if it is dead/cancelled, then create a new connection. Users shouldn't need to bother with that.
Privet
There is case when something happened and storage fiber is cancelled (for e.g. cartridge hotreload or any other fiber killer).
Some affected snippet
The question is, how to restart netbox connection under vshard.router? Or is it possible to be done on vshard side?