Closed JeffreyDevloo closed 8 years ago
Not sure what's the issue reported: if a customer wants to remove one out of a two node setup, that's perfectly fine, nothing will go wrong. If a node has - for some reason - crashed, we can't prohibit a customer from removing that crashed node out of the cluster. And with a non-reachable node in a cluster the customer can't do anything (most of the functionality is blocked at that point)
I don't consider going from 3 nodes gracefully to 1 something we need to tackle in the code (atm). From 3 to 2 and back to 3 is something which should of course be covered. Going from 2 to 1 should NOT lead to corruption, if that is the case we need to fix that and re-open this ticket.
@wimpers, it is supported in the code to go from 3 to 1 as far as I know, and if that is not possible, it is indeed considered separate issue.
@JeffreyDevloo were both nodes up when you executed the ovs remove? Any chance you still have the logs to see why the cluster shrink didn't execute correctly?
Problem description
There is no check on 'ovs remove nodes' that prohibits from removing the majority of the nodes and therefore corrupting your cluster
Possible root of the problem
No check on number of nodes that are needed to survive before executing remove node
Possible solution
Add this check
Temporary solution
None
Additional information
Setup
Hyperconverged setup
Package information