Closed tobiasehlert closed 8 months ago
Found also some event that looks reasonable for one of my nodes (k3s-01-agent-small-nbg1-vvj) in the cluster:
Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj after 398.38809ms: hcloud/CreateRoute: invalid gateway (invalid_input)
When looking at the pod logs of hcloud-cloud-controller-manager it looks like there is some routing issue..
2024-02-22T13:42:11+01:00 I0222 12:42:11.829375 1 route_controller.go:216] action for Node "k3s-01-control-plane-hel1-iwm" with CIDR "10.20.132.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829410 1 route_controller.go:216] action for Node "k3s-01-control-plane-nbg1-oze" with CIDR "10.20.131.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829422 1 route_controller.go:216] action for Node "k3s-01-agent-small-nbg1-vvj" with CIDR "10.20.128.0/24": "add"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829433 1 route_controller.go:216] action for Node "k3s-01-agent-small-nbg1-yiv" with CIDR "10.20.129.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829445 1 route_controller.go:216] action for Node "k3s-01-control-plane-fsn1-ywt" with CIDR "10.20.130.0/24": "keep"
2024-02-22T13:42:11+01:00 I0222 12:42:11.829459 1 route_controller.go:290] route spec to be created: &{ k3s-01-agent-small-nbg1-vvj false [{InternalIP 10.20.128.101} {Hostname k3s-01-agent-small-nbg1-vvj} {ExternalIP XX.XX.XX.XX}] 10.20.128.0/24 false}
2024-02-22T13:42:11+01:00 I0222 12:42:11.829493 1 route_controller.go:304] Creating route for node k3s-01-agent-small-nbg1-vvj 10.20.128.0/24 with hint fac268fa-acab-4287-bc7f-5008bb1790cf, throttled 12.44µs
2024-02-22T13:42:12+01:00 E0222 12:42:12.401242 1 route_controller.go:329] Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj: hcloud/CreateRoute: invalid gateway (invalid_input)
2024-02-22T13:42:12+01:00 I0222 12:42:12.401365 1 route_controller.go:387] Patching node status k3s-01-agent-small-nbg1-vvj with false previous condition was:&NodeCondition{Type:NetworkUnavailable,Status:False,LastHeartbeatTime:2024-02-22 12:42:00 +0000 UTC,LastTransitionTime:2024-02-22 12:42:00 +0000 UTC,Reason:CiliumIsUp,Message:Cilium is running on this node,}
2024-02-22T13:42:12+01:00 I0222 12:42:12.401535 1 event.go:307] "Event occurred" object="k3s-01-agent-small-nbg1-vvj" fieldPath="" kind="Node" apiVersion="" type="Warning" reason="FailedToCreateRoute" message="Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj after 571.712557ms: hcloud/CreateRoute: invalid gateway (invalid_input)"
Someone experienced this before?
Thanks for sharing @tobiasehlert, @M4t7e FYI happening in cilium.
I suspect it's because of the cilium routing mode "native", @tobiasehlert please remove that line and let us know 🙏
I suspect it's because of the cilium routing mode "native", @tobiasehlert please remove that line and let us know 🙏
Yes, from what I've seen yet it looks exactly like that.. just removed the whole cluster and created a new one and it's not working with cilium_routing_mode set to tunnel
. But there was no difference at all @mysticaltech
To me it looks like it's the hcloud csi things that are the issue in this case.. but I can't get my head around the issue.
@tobiasehlert Weird, it's the first time we hear of that. Please inspect and share your hcloud ccm and csi logs then if you suspect this is causing the issue. Also please have a look at our readme's debug section and try to do some general node level debug just in case. Also, the hcloud
cli can be useful here to inspect the routes and such.
Hey @tobiasehlert, HCCM is already hinting at what's wrong here:
Could not create route fac268fa-acab-4287-bc7f-5008bb1790cf 10.20.128.0/24 for node k3s-01-agent-small-nbg1-vvj: hcloud/CreateRoute: invalid gateway (invalid_input)
Overview:
10.20.128.0/17
(Hetzner Network)
10.20.128.0/20
(Reserved for K8s Pod Networks -> HCCM RouteController)k3s-01-agent-small-nbg1-vvj: 10.20.128.101
HCCM RouteController tried to add the Pod network route 10.20.128.0/24
(probably matching 1:1 with the subnet of the server itself) with 10.20.128.101
as the gateway:
0.0.0.0/0
)You have to leave enough space at the beginning and at the end of network_ipv4_cidr
for Hetzner Networks, so that they don't collide with Pod and Service CIDRs (especially at the beginning of the ranges).
Thanks for your response @M4t7e!
What size should the both Service and Cluster code be each? Do you have some suggestions there?
@tobiasehlert Yeah, sure. Here some considerations for the subnetting...
You need enough space for Hetzner Subnets. Total limit today is 50 Subnets per Network (see https://docs.hetzner.com/cloud/networks/faq#are-there-any-limits-on-how-networks-can-be-used).
For routing configuration simplicity, it's best if cluster_ipv4_cidr
falls within network_ipv4_cidr
. The cluster_ipv4_cidr
will use most IPs since they are allocated for the Pods, and Hetzner CCM reserves larger ranges for the Nodes, adding the Pod routes with the corresponding Node IP as the gateway. Max 100 routes per Network are possible (see Hetzner faq). service_ipv4_cidr
typically requires less space compared to the Pods.
Hetzner Subnets and Pod Networks are both allocated in ascending order. Therefore, we could disregard the Server Node Subnets at the end (it's highly unlikely they will ever be used) if we aim to save space.
One example could be like this:
10.0.0.0/16
(sufficient for 64 /24 Subnets -> you can treat only 10.0.0.0/18
as reserved for it)10.0.64.0/18
(half size of cluster_ipv4_cidr
)10.0.64.10
(has to be in service_ipv4_cidr
)10.0.128.0/17
(biggest range for Pods -> more than 100 /24 networks/routes for Pods)Thanks @M4t7e!
I'll go for this then :)
network_ipv4_cidr = "10.20.128.0/17"
service_ipv4_cidr = "10.20.160.0/19"
cluster_ipv4_cidr = "10.20.192.0/18"
cluster_dns_ipv4 = "10.20.160.10"
Thanks @M4t7e, excellent! Should've had a better look at the kube.tf.
@tobiasehlert When you change IP ranges, you really have to know what you are doing and get a good look at what it affects within the code. For most scenarios, you can just keep the defaults as they are proven to work well.
@tobiasehlert When you change IP ranges, you really have to know what you are doing and get a good look at what it affects within the code. For most scenarios, you can just keep the defaults as they are proven to work well.
Yeah I saw that note about changing cidrs, but had to due some overlapping cidr :( But yeah, thanks to @M4t7e it works now.. was unaware how to portion up the subnets, but how it rocks :D
Description
I'm using the Terraform provider version v2.12.0 with Cilium
v1.15.1
and K3sv1.28.6+k3s2
.I don't really get my head around, why the Kured pod can't communicate with the
kubernetes
service running onhttps://10.20.144.1:443
in my cluster, but it results in a restarted pod due to timeout.Kube.tf file
Screenshots
No response
Platform
Linux