kinvolk / lokomotive

🪦 DISCONTINUED Further Lokomotive development has been discontinued. Lokomotive is a 100% open-source, easy to use and secure Kubernetes distribution from the volks at Kinvolk
https://kinvolk.io/lokomotive-kubernetes/
Apache License 2.0
321 stars 49 forks source link

The cluster apply command does not check etcd health #1383

Open pothos opened 3 years ago

pothos commented 3 years ago

Description

The lokoctl cluster apply command checks if the nodes are up and reports that the cluster is health even when one etcd member did not rejoin:

Now checking health and readiness of the cluster nodes ...

Node                      Ready    Reason          Message                            
[…]
…       True     KubeletReady    kubelet is posting ready status    

Success - cluster is healthy and nodes are ready!

as can be seen by running lokoctl health:

Name      Status    Message                                                                                                      Error    

etcd-1    False     Get "https://controller2.lokomotive:2379/health": dial tcp 172.24.213.3:2379: connect: connection refused             
etcd-2    True    {"health":"true"}        
etcd-0    True    {"health":"true"}        

Impact

The user does not know that in an HA control plane cluster the etcd quorum may be at risk because the output suggests the cluster would be healthy (unless the user knows that only lokoctl health tells the truth).

Environment and steps to reproduce

Without the fix in https://github.com/kinvolk/lokomotive/pull/1382 one can recreate a controller node and its etcd does not rejoin the etcd cluster.

Expected behavior

Cluster.Verify should call the Cluster.Health function

Additional information

invidian commented 3 years ago

Actually, API used for checking etcd health is deprecated and I'm not sure if there is a replacement: #777.

Note, that functional monitoring stack will alert you if etcd has issues.