hetznercloud / hcloud-cloud-controller-manager

Kubernetes cloud-controller-manager for Hetzner Cloud
Apache License 2.0
745 stars 117 forks source link

LB targets contain deleted node after replacing all k8s nodes #505

Closed hwuethrich closed 1 year ago

hwuethrich commented 1 year ago

We are using kubeone and machine-controller to manage our k8s nodes.

We upgraded our cluster of 8 nodes from k8s 1.21 -> 1.26 yesterday, which caused the controller to add/remove a lot of targets from the LBs. We noticed that in some cases the LBs were in a broken state after replacing all nodes and the controller was unable to update the targets:

E0907 07:32:48.631394       1 controller.go:781] failed to update load balancer hosts for service kube-ingress/ingress-nginx-controller: hcloud/loadBalancers.UpdateLoadBalancer: hcops/LoadBalancerOps.ReconcileHCLBTargets: target: : server with ID <redacted> not found (invalid_input)
I0907 07:32:48.631455       1 event.go:294] "Event occurred" object="kube-ingress/ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="UpdateLoadBalancerFailed" message="Error updating load balancer with new hosts map[cloud-fsn1-5544f9f46d-gskkp:{} cloud-fsn1-5544f9f46d-pzlxt:{} cloud-fsn1-5544f9f46d-thwtl:{} cloud-fsn1-5544f9f46d-vq5cp:{} cloud-nbg1-7c8d5d6f87-8s8c2:{} cloud-nbg1-7c8d5d6f87-b2wpx:{} cloud-nbg1-7c8d5d6f87-ddhv4:{} cloud-nbg1-7c8d5d6f87-mhm2x:{}]: hcloud/loadBalancers.UpdateLoadBalancer: hcops/LoadBalancerOps.ReconcileHCLBTargets: target: : server with ID <redacted> not found (invalid_input)"

In the HCloud Console, the LB had an empty list of targets but showed "1 Server" at the top. I now added the 8 existing nodes shown below to prevent an outage, but the deleted node is not visible in the UI and can't be deleted:

Screenshot 2023-09-07 at 09 37 58

When deleting and recreating the LB the issue is fixed, but will have a new IP address (which we would like to avoid). This is probably a HCloud API race-condition and I will open a ticket with Hetzner support. This is just to make people aware of this issue.

apricote commented 1 year ago

Hey @hwuethrich,

thanks for the report, we received multiple similar reports. We have identified an issue with the way lb targets are removed when the server is deleted. This was fixed on 19 October 2023, and should not happen anymore. If this exact issue still happens for you, it would be best if you create a support ticket, as the responsible team can then directly react to your issue and have access to internal logs etc. You can open a new support ticket here: https://console.hetzner.cloud/support