Routes are not cleaned after scale down/node removal via cluster-autoscaler

pat-s commented 2 months ago

TL;DR

See title.

Expected behavior

Routes are removed again after the node is deleted.

Observed behavior

Routes are not removed and accumulate in the account, leading to node startup failures when the route rate limit is hit (100?).

Minimal working example

No response

Log output

No response

Additional information

Initially posted in https://github.com/kubernetes/autoscaler/issues/7227

apricote commented 2 months ago

Hey @pat-s,

could you post the logs of hcloud-cloud-controller-manager and its configuration? Especially the networking part of the configuration is of interest.

pat-s commented 2 months ago

v1.20.0 running with

     Command:                                                                                                                                                                                             │
│       /bin/hcloud-cloud-controller-manager                                                                                                                                                               │
│       --cloud-provider=hcloud                                                                                                                                                                            │
│       --leader-elect=false                                                                                                                                                                               │
│       --allow-untagged-cloud                                                                                                                                                                             │
│       --allocate-node-cidrs=true                                                                                                                                                                         │
│       --cluster-cidr=10.42.0.0/16                                                                                                                                                                        │
│       --webhook-secure-port=0                                                                                                                                                                            │
│       --secure-port=10288                                                                                                                                                                                │
│     Args:                                                                                                                                                                                                │
│       --allow-untagged-cloud                                                                                                                                                                             │
│       --cloud-provider=hcloud                                                                                                                                                                            │
│       --route-reconciliation-period=30s                                                                                                                                                                  │
│       --webhook-secure-port=0                                                                                                                                                                            │
│       --allocate-node-cidrs=true                                                                                                                                                                         │
│       --cluster-cidr=10.244.0.0/16                                                                                                                                                                       │
│       --leader-elect=false

Running on a k3s cluster deployed with terraform-hcloud-kube-hetzner.

k8s version: 1.29.8

fatelgit commented 2 months ago

+1 We are hitting the 100 routes limit as well which seems to be related to a lot of node scaling events. Some log output:

I0910 07:31:35.984663       1 route_controller.go:216] action for Node "k3s-autoscaled-cx32-nbg1-43ef964b133bcc7a" with CIDR "10.42.98.0/24": "keep"
I0910 07:31:35.984710       1 route_controller.go:216] action for Node "k3s-autoscaled-cx32-nbg1-4cdf23e2d9958bc0" with CIDR "10.42.8.0/24": "keep"
I0910 07:31:35.984724       1 route_controller.go:216] action for Node "k3s-autoscaled-cx32-nbg1-5797432f74cfeeb7" with CIDR "10.42.173.0/24": "add"
I0910 07:31:35.984732       1 route_controller.go:216] action for Node "k3s-autoscaled-cx32-nbg1-f976ab63fe71c24" with CIDR "10.42.96.0/24": "keep"
I0910 07:31:35.984740       1 route_controller.go:216] action for Node "k3s-agent-cx32-nbg1-xiq" with CIDR "10.42.4.0/24": "keep"
I0910 07:31:35.984747       1 route_controller.go:216] action for Node "k3s-autoscaled-cx32-nbg1-639592727b666e7a" with CIDR "10.42.1.0/24": "keep"
I0910 07:31:35.984753       1 route_controller.go:216] action for Node "k3s-control-plane-fsn1-cax21-aui" with CIDR "10.42.2.0/24": "keep"
I0910 07:31:35.984760       1 route_controller.go:216] action for Node "k3s-control-plane-fsn1-cax21-dec" with CIDR "10.42.3.0/24": "keep"
I0910 07:31:35.984766       1 route_controller.go:216] action for Node "k3s-control-plane-fsn1-cax21-glo" with CIDR "10.42.0.0/24": "keep"
I0910 07:31:35.984773       1 route_controller.go:216] action for Node "k3s-autoscaled-cx32-nbg1-27c56399330a466b" with CIDR "10.42.175.0/24": "add"
I0910 07:31:35.984841       1 route_controller.go:290] route spec to be created: &{ k3s-autoscaled-cx32-nbg1-5797432f74cfeeb7 false [{InternalIP 10.255.0.3} {Hostname k3s-autoscaled-cx32-nbg1-5797432f74cfeeb7} {ExternalIP 5.75.159.156}] 10.42.173.0/24 false}
I0910 07:31:35.984882       1 route_controller.go:290] route spec to be created: &{ k3s-autoscaled-cx32-nbg1-27c56399330a466b false [{InternalIP 10.255.0.4} {Hostname k3s-autoscaled-cx32-nbg1-27c56399330a466b} {ExternalIP 116.203.22.3}] 10.42.175.0/24 false}
I0910 07:31:35.984920       1 route_controller.go:304] Creating route for node k3s-autoscaled-cx32-nbg1-27c56399330a466b 10.42.175.0/24 with hint 37d3ec76-e77e-4059-bced-213b30b18df8, throttled 16.92µs
I0910 07:31:35.985003       1 route_controller.go:304] Creating route for node k3s-autoscaled-cx32-nbg1-5797432f74cfeeb7 10.42.173.0/24 with hint fff69d03-79ca-4233-ac69-2494b227cb68, throttled 19.96µs
E0910 07:31:36.071533       1 route_controller.go:329] Could not create route fff69d03-79ca-4233-ac69-2494b227cb68 10.42.173.0/24 for node k3s-autoscaled-cx32-nbg1-5797432f74cfeeb7: hcloud/CreateRoute: route limit reached (forbidden)
I0910 07:31:36.071640       1 event.go:389] "Event occurred" object="k3s-autoscaled-cx32-nbg1-5797432f74cfeeb7" fieldPath="" kind="Node" apiVersion="" type="Warning" reason="FailedToCreateRoute" message="Could not create route fff69d03-79ca-4233-ac69-2494b227cb68 10.42.173.0/24 for node k3s-autoscaled-cx32-nbg1-5797432f74cfeeb7 after 86.536869ms: hcloud/CreateRoute: route limit reached (forbidden)"
E0910 07:31:36.091213       1 route_controller.go:329] Could not create route 37d3ec76-e77e-4059-bced-213b30b18df8 10.42.175.0/24 for node k3s-autoscaled-cx32-nbg1-27c56399330a466b: hcloud/CreateRoute: route limit reached (forbidden)
I0910 07:31:36.091298       1 route_controller.go:387] Patching node status k3s-autoscaled-cx32-nbg1-27c56399330a466b with false previous condition was:&NodeCondition{Type:NetworkUnavailable,Status:False,LastHeartbeatTime:2024-09-10 07:31:06 +0000 UTC,LastTransitionTime:2024-09-10 07:31:06 +0000 UTC,Reason:CiliumIsUp,Message:Cilium is running on this node,}
I0910 07:31:36.091576       1 event.go:389] "Event occurred" object="k3s-autoscaled-cx32-nbg1-27c56399330a466b" fieldPath="" kind="Node" apiVersion="" type="Warning" reason="FailedToCreateRoute" message="Could not create route 37d3ec76-e77e-4059-bced-213b30b18df8 10.42.175.0/24 for node k3s-autoscaled-cx32-nbg1-27c56399330a466b after 106.265644ms: hcloud/CreateRoute: route limit reached (forbidden)"
I0910 07:31:36.091586       1 route_controller.go:387] Patching node status k3s-autoscaled-cx32-nbg1-5797432f74cfeeb7 with false previous condition was:&NodeCondition{Type:NetworkUnavailable,Status:False,LastHeartbeatTime:2024-09-10 07:31:06 +0000 UTC,LastTransitionTime:2024-09-10 07:31:06 +0000 UTC,Reason:CiliumIsUp,Message:Cilium is running on this node,}

Is there a way to reset routes manually? Or a way to figure out which routes are really in use?

fatelgit commented 2 months ago

I just deleted about 30 routes for a node with internal IP 10.255.0.4 which has been removed hours ago. So I checked the logs for this node:

I0911 05:29:19.036441       1 route_controller.go:290] route spec to be created: &{ k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8 false [{InternalIP 10.255.0.4} {Hostname k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8} {ExternalIP 116.203.22.3}] 10.42.205.0/24 false}
I0911 05:29:19.036504       1 route_controller.go:304] Creating route for node k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8 10.42.205.0/24 with hint 8a63737f-26ce-4f88-9e75-1598c89f8c68, throttled 14.64µs
E0911 05:29:19.036554       1 route_controller.go:329] Could not create route 8a63737f-26ce-4f88-9e75-1598c89f8c68 10.42.205.0/24 for node k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8: hcloud/CreateRoute: hcops/AllServersCache.ByName: k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8 hcops/AllServersCache.getCache: not found
I0911 05:29:19.036591       1 route_controller.go:387] Patching node status k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8 with false previous condition was:&NodeCondition{Type:NetworkUnavailable,Status:False,LastHeartbeatTime:2024-09-11 04:07:14 +0000 UTC,LastTransitionTime:2024-09-11 04:07:14 +0000 UTC,Reason:CiliumIsUp,Message:Cilium is running on this node,}
I0911 05:29:19.036655       1 event.go:389] "Event occurred" object="k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8" fieldPath="" kind="Node" apiVersion="" type="Warning" reason="FailedToCreateRoute" message="Could not create route 8a63737f-26ce-4f88-9e75-1598c89f8c68 10.42.205.0/24 for node k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8 after 44.32µs: hcloud/CreateRoute: hcops/AllServersCache.ByName: k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8 hcops/AllServersCache.getCache: not found"
I0911 05:29:27.867206       1 event.go:389] "Event occurred" object="k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="DeletingNode" message="Deleting node k3s-autoscaled-cx32-nbg1-26bc4cf733a14b8 because it does not exist in the cloud provider"
I0911 05:29:27.924289       1 load_balancers.go:281] "update Load Balancer" op="hcloud/loadBalancers.UpdateLoadBalancer" service="traefik" nodes=["k3s-autoscaled-cx32-nbg1-f976ab63fe71c24","k3s-agent-cx32-nbg1-xiq","k3s-autoscaled-cx32-nbg1-6e1b0889ee0036f2","k3s-autoscaled-cx32-nbg1-43ef964b133bcc7a","k3s-autoscaled-cx32-nbg1-639592727b666e7a","k3s-autoscaled-cx32-nbg1-4cdf23e2d9958bc0"]
I0911 05:29:29.037883       1 load_balancer.go:850] "update service" op="hcops/LoadBalancerOps.ReconcileHCLBServices" port=80 loadBalancerID=1422790
I0911 05:29:30.793144       1 load_balancer.go:850] "update service" op="hcops/LoadBalancerOps.ReconcileHCLBServices" port=443 loadBalancerID=1422790
I0911 05:29:32.514155       1 event.go:389] "Event occurred" object="traefik/traefik" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="UpdatedLoadBalancer" message="Updated load balancer with new hosts"

Which shows that there is an event when the node is deleting but the routes are still present.

jooola commented 2 months ago

@fatelgit @pat-s Please open a support ticket on the cloud console https://console.hetzner.cloud/support so we can fix this issue.

apricote commented 2 months ago

As @jooola wrote, if you open an actual support ticket we can use our internal support panels to gain more insights into your projects and see what is happening on our side.

I do think I found the bug without any additional info though.

In the configuration @pat-s posted, you can see that the flag --cluster-cidr= is specfied twice. Once in the command (10.42.0.0/16) and once in the args (10.244.0.0/16). From a quick local test, it seems that the last flag wins, so the one in args

Based on the logs @fatelgit posted, it seems like the cluster is configured to assign Node Pod CIDRs in the 10.42.0.0/16 range, which we then use to create routes.

But HCCM only removes routes from the range specified in the --cluster-cidr flag, which is 10.244.0.0/16.

This mismatch leads to the previous routes not being cleaned up. You should change your hcloud-cloud-controller-manager configuration to only have the correct flag for your cluster setup.

If I find the time today, I will open an issue with kube-hetzner to explain the problem. But feel free to open one yourself if you are quicker than me or I dont get to it today.

apricote commented 2 months ago

This is explained in our docs: https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/main/docs/deploy_with_networks.md#considerations-on-the-ip-ranges

pat-s commented 2 months ago

@apricote Opened a support ticket.

I found 10.244.0.0/16 being hardcoded in https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/cb91679010c9de2046e85187ad61967574f50f6c/deploy/ccm-networks.yaml#L69. My cluster CIDR is in fact 10.42.0.0/16 and if it gets overwritten by 10.244.0.0/16, then I understand why the removal is not working.

I looks like that the config sent by kube-hetzner is fully parsed as the Command: whereas it should likely be sent as Command and Args and with that overwrite the defaults of HCCM?

pat-s commented 2 months ago

It seems like this change here might be responsible: https://github.com/hetznercloud/hcloud-cloud-controller-manager/commit/2ba40588d3b3b44ac3c0fa4ff9ae9e9fd3336cc9

Maybe updating https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/blob/master/templates/ccm.yaml.tpl to align with the recent changes might already do it?

apricote commented 2 months ago

I opened https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/1477

In general, this is not an issue with hcloud-cloud-controller-manager but rather a misconfiguration by the user (through kube-hetzner). Hetzner does not provide official support for this.

apricote commented 4 weeks ago

This fix for this was released in kube-hetzner v2.14.5

I will close the issue, please respond if you still encounter the same problem.

hetznercloud / hcloud-cloud-controller-manager