leo-project / leofs

The LeoFS Storage System
https://leo-project.net/leofs/
Apache License 2.0
1.55k stars 155 forks source link

[leo_manager] Recover manager's RING in case storage's RING is correct in a manual way #1080

Open mocchira opened 6 years ago

mocchira commented 6 years ago

In order to deal with cases like https://github.com/leo-project/leofs/issues/1078

Through the investigation for #1078, It turns out that we can recover the manager's RING by just issuing the command "leo_redundant_manager_api:create()." on remote_console if you are sure that the cluster member list is correct.

mocchira commented 5 years ago

We've come to the conclusion that we will implement it as one of the leofs-adm commands recover-manager-ring.

yosukehara commented 5 years ago

When I considered this implementation, I recognized LeoFS already has recover ring <storage-node>". I propose recover ring command is applied to nodes of LeoGateway, LeoStorage, and LeoManager because the purpose of recovering RING is same whatever the node type.

mocchira commented 5 years ago

When I considered this implementation, I recognized LeoFS already has recover ring ". I propose recover ring command is applied to nodes of LeoGateway, LeoStorage, and LeoManager because the purpose of recovering RING is same whatever the node type.

so you mean we will implement "leofs-adm recover-ring \<manager-node>|\<gateway-node>" ? then I agree with you.

The one thing we have to consider is when executing "leofs-adm recover-ring \<gateway-node>", which node (manager or storage nodes) should gateway-node retrieve the ring information from. Since it depends on the situation (which node has the correct ring information), we may consider to add a new param "from" to specify the node from which gateway-node retrieve the ring information or keep it as is (no additional params) and force users to take care of the order of commands. For example, if the ring info of manager-node and gateway-node was broken then a user have to execute the procedure below

### the execution order is important
### (if the order is reversed then it won't work)
$ leofs-adm recover-ring <manager-node>
$ leofs-adm recover-ring <gateway-node>

Seems it may make users a little bit confused so we have to document about the procedure in details if we go with this way OR we have to come up with something other solution which make users less confused.

yosukehara commented 5 years ago

related to https://github.com/leo-project/leofs/issues/1153#issuecomment-443431349