Closed randomvariable closed 4 years ago
/assign /lifecycle active
@randomvariable Are you creating some etcd utils as part of this? The reason I ask is that, the control plane controller needs to talk to etcd to remove a member as part of deleting a control plane replica. I'd like that to talk to etcd using the same mechanism as the healthchecks. The control plane CAEP mentions two options
Running PodExec etcdctl, or port-forwarding to etcd to get etcd cluster health information
Do you have a preference? I think pod exec'ing makes it easier to access the necessary certificates, but port-forwarding lets us avoid shelling out to etcdctl.
@dlipovetsky I believe @randomvariable is planning on using port-forward, and he's working on a library and the plan is to open up a PR real soon.
/remove lifecycle-active
Currently working on https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/1490 in case someone else wants to finish of consuming #2031 and #2030
/remove lifecycle-active
@randomvariable I can take this! Why was #2031 closed? It seems to be closed with no note
Oh, I thought I left a comment. Mainly was going to a single PR with the consumption included rather than having it as an abstract package the API of which might need to be changed.
ack
/assign
@dlipovetsky had some additional comments in that because none of the alarms actually report connectivity, you can still have a network partition and not get an error. Suggested to do as etcdctl does, and issue a get to a known key.
@chuckha @randomvariable Can this be closed in favor of #2243?
whoops, yep, duplicate, replaced by #2243
/closing
/close
😑
@chuckha: Closing this issue.
Implement control plane healthchecks as part of #1756