JeffreyDevloo commented 8 years ago

Problem description

There is no check on 'ovs remove nodes' that prohibits from removing the majority of the nodes and therefore corrupting your cluster

Possible root of the problem

No check on number of nodes that are needed to survive before executing remove node

Possible solution

Add this check

Temporary solution

None

Additional information

Setup

Hyperconverged setup

Three nodes with each three disks for the back-end (Although this is irrelevant and will corrupt any setup)
Package information
ii openvstorage 2.7.1-fargo.2-1 amd64 openvStorage
ii openvstorage-backend 1.7.1-fargo.1-1 amd64 openvStorage Backend plugin
ii openvstorage-backend-core 1.7.1-fargo.1-1 amd64 openvStorage Backend plugin core
ii openvstorage-backend-webapps 1.7.1-fargo.1-1 amd64 openvStorage Backend plugin Web Applications
ii openvstorage-cinder-plugin 1.2.1-fargo.1-1 amd64 OpenvStorage Cinder plugin for OpenStack
ii openvstorage-core 2.7.1-fargo.2-1 amd64 openvStorage core
ii openvstorage-hc 1.7.1-fargo.1-1 amd64 openvStorage Backend plugin HyperConverged
ii openvstorage-sdm 1.6.1-fargo.1-1 amd64 Open vStorage Backend ASD Manager
ii openvstorage-test 2.7.1-fargo.2-1 amd64 openvStorage autotest suite
ii openvstorage-webapps 2.7.1-fargo.2-1 amd64 openvStorage Web Applications

khenderick commented 8 years ago

Not sure what's the issue reported: if a customer wants to remove one out of a two node setup, that's perfectly fine, nothing will go wrong. If a node has - for some reason - crashed, we can't prohibit a customer from removing that crashed node out of the cluster. And with a non-reachable node in a cluster the customer can't do anything (most of the functionality is blocked at that point)

wimpers commented 8 years ago

I don't consider going from 3 nodes gracefully to 1 something we need to tackle in the code (atm). From 3 to 2 and back to 3 is something which should of course be covered. Going from 2 to 1 should NOT lead to corruption, if that is the case we need to fix that and re-open this ticket.

khenderick commented 8 years ago

@wimpers, it is supported in the code to go from 3 to 1 as far as I know, and if that is not possible, it is indeed considered separate issue.

wimpers commented 8 years ago

@JeffreyDevloo were both nodes up when you executed the ovs remove? Any chance you still have the logs to see why the cluster shrink didn't execute correctly?

openvstorage / framework

2 master node setup can't remove another master #799