kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.5k stars 1.3k forks source link

Control plane healthchecks #1902

Closed randomvariable closed 4 years ago

randomvariable commented 4 years ago

Implement control plane healthchecks as part of #1756

randomvariable commented 4 years ago

/assign /lifecycle active

dlipovetsky commented 4 years ago

@randomvariable Are you creating some etcd utils as part of this? The reason I ask is that, the control plane controller needs to talk to etcd to remove a member as part of deleting a control plane replica. I'd like that to talk to etcd using the same mechanism as the healthchecks. The control plane CAEP mentions two options

Running PodExec etcdctl, or port-forwarding to etcd to get etcd cluster health information

Do you have a preference? I think pod exec'ing makes it easier to access the necessary certificates, but port-forwarding lets us avoid shelling out to etcdctl.

ncdc commented 4 years ago

@dlipovetsky I believe @randomvariable is planning on using port-forward, and he's working on a library and the plan is to open up a PR real soon.

randomvariable commented 4 years ago

/remove lifecycle-active

Currently working on https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/1490 in case someone else wants to finish of consuming #2031 and #2030

randomvariable commented 4 years ago

/remove lifecycle-active

chuckha commented 4 years ago

@randomvariable I can take this! Why was #2031 closed? It seems to be closed with no note

randomvariable commented 4 years ago

Oh, I thought I left a comment. Mainly was going to a single PR with the consumption included rather than having it as an abstract package the API of which might need to be changed.

chuckha commented 4 years ago

ack

/assign

randomvariable commented 4 years ago

@dlipovetsky had some additional comments in that because none of the alarms actually report connectivity, you can still have a network partition and not get an error. Suggested to do as etcdctl does, and issue a get to a known key.

vincepri commented 4 years ago

@chuckha @randomvariable Can this be closed in favor of #2243?

chuckha commented 4 years ago

whoops, yep, duplicate, replaced by #2243

/closing

chuckha commented 4 years ago

/close

😑

k8s-ci-robot commented 4 years ago

@chuckha: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/1902#issuecomment-585399246): >/close > >😑 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.