hetznercloud / hcloud-cloud-controller-manager

Kubernetes cloud-controller-manager for Hetzner Cloud
Apache License 2.0
740 stars 118 forks source link

ccm route controller doesnt create route with calico cni #716

Closed tagurus closed 3 months ago

tagurus commented 3 months ago

TL;DR

I have sample 2node k8s cluster with internal network kube version 1.28.10 ccm 1.20.0 Previous i have used flannel cni with vxlan mode and it is works with ccm correctly, route created, and traffic goes throw internal network.

now i try to use calico with vxlan crossubnet. Network is up, but ccm doesnt create route on hetzner network, traffic goes throw default external gateway

i have read some simmilar issue, but don't have idea how to fix it

Expected behavior

ccm creates route on hetzner cloud, pod-to-pod traffic goes throw internal networks

изображение

Observed behavior

ccm doesnt create route, traffic goes throw external network interface (default route)

Minimal working example

No response

Log output

ccm logs

Flag --allow-untagged-cloud has been deprecated, This flag is deprecated  and will be removed in a future release. A cluster-id will be required on cloud instances.<br>
I0813 12:56:20.481881       1 serving.go:380] Generated self-signed cert in-memory<br>
W0813 12:56:20.482035       1 client_config.go:659] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.<br>
I0813 12:56:21.037023       1 metrics.go:69] Starting metrics server at :8233<br>
I0813 12:56:21.890753       1 cloud.go:127] Hetzner Cloud k8s cloud controller v1.20.0 started<br>
W0813 12:56:21.891162       1 main.go:75] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues<br>
I0813 12:56:21.891413       1 controllermanager.go:169] Version: v0.0.0-master+$Format:%H$<br>
I0813 12:56:21.901102       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController<br>
I0813 12:56:21.901548       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"<br>
I0813 12:56:21.901763       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"<br>
I0813 12:56:21.901796       1 shared_informer.go:313] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file<br>
I0813 12:56:21.901866       1 shared_informer.go:313] Waiting for caches to sync for RequestHeaderAuthRequestController<br>
I0813 12:56:21.901381       1 secure_serving.go:213] Serving securely on [::]:10258<br>
I0813 12:56:21.902061       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"<br>
I0813 12:56:21.902611       1 shared_informer.go:313] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file<br>
I0813 12:56:21.912210       1 controllermanager.go:338] Started "cloud-node-controller"<br>
I0813 12:56:21.912478       1 controllermanager.go:338] Started "cloud-node-lifecycle-controller"<br>
I0813 12:56:21.912731       1 node_controller.go:164] Sending events to api server.<br>
I0813 12:56:21.912842       1 node_controller.go:173] Waiting for informer caches to sync<br>
I0813 12:56:21.912945       1 node_lifecycle_controller.go:113] Sending events to api server<br>
I0813 12:56:21.913541       1 controllermanager.go:338] Started "service-lb-controller"<br>
I0813 12:56:21.913835       1 controller.go:231] Starting service controller<br>
I0813 12:56:21.913961       1 shared_informer.go:313] Waiting for caches to sync for service<br>
I0813 12:56:21.991873       1 controllermanager.go:338] Started "node-route-controller"<br>
I0813 12:56:21.992727       1 route_controller.go:104] Starting route controller<br>
I0813 12:56:21.993158       1 shared_informer.go:313] Waiting for caches to sync for route<br>
I0813 12:56:22.003363       1 shared_informer.go:320] Caches are synced for RequestHeaderAuthRequestController<br>
I0813 12:56:22.005736       1 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file<br>
I0813 12:56:22.005860       1 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file<br>
I0813 12:56:22.014543       1 shared_informer.go:320] Caches are synced for service<br>
I0813 12:56:22.093552       1 shared_informer.go:320] Caches are synced for route<br>

node route table

default via 172.31.1.1 dev eth0 proto dhcp src 65.109.** metric 100<br>
10.0.0.0/8 via 10.0.0.1 dev enp7s0<br>
10.0.0.1 dev enp7s0 scope link<br>
10.2.44.0/24 via 10.2.44.0 dev vxlan.calico onlink<br>
blackhole 10.2.154.0/24 proto 80<br>
10.2.154.1 dev calif281218ad6d scope link<br>
10.2.154.2 dev cali12d4a061371 scope link<br>
172.31.1.1 dev eth0 proto dhcp scope link src 65.109.167.179 metric 100<br>
185.12.64.1 via 172.31.1.1 dev eth0 proto dhcp src 65.109.167.179 metric 100<br>
185.12.64.2 via 172.31.1.1 dev eth0 proto dhcp src 65.109.167.179 metric 100<br>

trace pod to pod between nodes

traceroute to 10.2.44.9 (10.2.44.9), 30 hops max, 46 byte packets<br>
 1  static.179.167..clients.your-server.de (65.109.)  0.021 ms  0.010 ms  0.007 ms<br>
 2  10.2.44.0 (10.2.44.0)  2.198 ms  0.957 ms  0.706 ms<br>
 3  10.2.44.9 (10.2.44.9)  0.464 ms  0.635 ms  0.262 ms<br>

calico spec

spec: <br>
  allowedUses: <br>
    - Workload <br>
    - Tunnel <br>
  blockSize: 24 <br>
  cidr: 10.2.0.0/16 <br>
  ipipMode: Never <br>
  natOutgoing: true <br> 
  nodeSelector: all() <br>
  vxlanMode: CrossSubnet <br>

Additional information

No response

tagurus commented 3 months ago

Additional expirement shows, that traffic goes throw internal network int, but it loks like native cni routing mechanism, without hcloud route controllers.

New cluster without ccm. calico cni installed

pod to pod tracing

ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0@if31: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1400 qdisc noqueue qlen 1000
    link/ether 1e:26:bb:80:c7:c9 brd ff:ff:ff:ff:ff:ff
    inet 10.2.154.4/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::1c26:bbff:fe80:c7c9/64 scope link
       valid_lft forever preferred_lft forever

traceroute 10.2.44.2
traceroute to 10.2.44.2 (10.2.44.2), 30 hops max, 46 byte packets
 1  static.179.167.109.65.clients.your-server.de (65.109.167.179)  0.028 ms  0.012 ms  0.010 ms
 2  10.2.44.0 (10.2.44.0)  0.586 ms  0.824 ms  0.770 ms
 3  10.2.44.2 (10.2.44.2)  0.025 ms  0.491 ms  0.467 ms

result: It works, but doesn't meet expectations

for comparison flannel pod-to pod with ccm and routes

ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if399: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether 56:cb:89:2a:d6:b3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.225.71/24 brd 192.168.225.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::54cb:89ff:fe2a:d6b3/64 scope link
       valid_lft forever preferred_lft forever

traceroute 192.168.226.175
traceroute to 192.168.226.175 (192.168.226.175), 30 hops max, 46 byte packets
 1  192.168.225.1 (192.168.225.1)  0.005 ms  0.004 ms  0.002 ms
 2  192.168.226.0 (192.168.226.0)  0.787 ms  0.413 ms  0.304 ms
 3  192.168.226.175 (192.168.226.175)  0.406 ms  0.293 ms  0.283 ms

изображение

tagurus commented 3 months ago

deal with it, calico works with bgp and without encapsulation