rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.54k stars 266 forks source link

rke2-canal: calico can't change iface with IP_AUTODETECTION_METHOD #5925

Closed dynastini1 closed 2 months ago

dynastini1 commented 4 months ago

Environmental Info: RKE2 Version: v1.28.9+rke2r1

Node(s) CPU architecture, OS, and Version: Linux prod-master-1 6.8.0-31-generic #31-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr 20 00:40:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 3 servers, master

My servers have 2 interfaces: eth0 (public) and enp7s0 (private, 10.0.0.0/8)

Describe the bug: I what to use private iface in calico. Flannel is ok, but calico always use eth0 (cali ifaces use if2 - eth0):

root@prod-master-1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 96:00:03:51:b0:9e brd ff:ff:ff:ff:ff:ff
    inet 159.69.x.x/32 metric 100 scope global dynamic eth0
       valid_lft 86029sec preferred_lft 86029sec
    inet6 ip::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 ip/64 scope link
       valid_lft forever preferred_lft forever
3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 86:00:00:8b:70:31 brd ff:ff:ff:ff:ff:ff
    inet 10.1.0.15/32 brd 10.1.0.15 scope global dynamic enp7s0
       valid_lft 86034sec preferred_lft 75234sec
    inet6 ip/64 scope link
       valid_lft forever preferred_lft forever
6: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default
    link/ether de:ca:d0:6c:da:13 brd ff:ff:ff:ff:ff:ff
    inet 10.42.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 ip/64 scope link
       valid_lft forever preferred_lft forever
7: cali7394c0c7203@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-f1e44223-8bb4-ed40-a915-4c508f61df4c
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
10: cali46f7c8177d1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-f0910828-2ca9-19a3-afa5-6f4f8022ab8b
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever

I try to use this config:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-canal
  namespace: kube-system
spec:
  valuesContent: |-
    flannel:
      iface: "enp7s0"
    calico:
      ipAutoDetectionMethod: "interface=enp7s0"
      # ipAutoDetectionMethod: "cidr=10.0.0.0/8"        # or like this
      # ipAutoDetectionMethod: skip-interface=eth.*   # or like this

And canal pod get IP_AUTODETECTION_METHOD after that config: rke2-canal-\<pod> env:

    - name: IP_AUTODETECTION_METHOD
      value: cidr=10.0.0.0/8

and it's doesn't helped:

❯ kubectl -n kube-system logs daemonsets/rke2-canal -c calico-node | grep -i enp
Found 3 pods, using pod/rke2-canal-fxpxx
2024-05-18 14:05:14.282 [INFO][56] felix/int_dataplane.go 1387: Linux interface state changed. ifIndex=3 ifaceName="enp7s0" state="up"
2024-05-18 14:05:14.283 [INFO][56] felix/int_dataplane.go 1431: Linux interface addrs changed. addrs=set.Set{10.1.0.15,fe80::8400:ff:fe8b:7031} ifaceName="enp7s0"
2024-05-18 14:05:14.283 [INFO][56] felix/int_dataplane.go 2011: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"enp7s0", State:"up", Index:3}
2024-05-18 14:05:14.283 [INFO][56] felix/int_dataplane.go 2031: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"enp7s0", Addrs:set.Typed[string]{"10.1.0.15":set.v{}, "fe80::8400:ff:fe8b:7031":set.v{}}}
2024-05-18 14:05:14.283 [INFO][56] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"enp7s0", Addrs:set.Typed[string]{"10.1.0.15":set.v{}, "fe80::8400:ff:fe8b:7031":set.v{}}}
~/main !11 ?7 ·························································································································· 󰌠 3.12.3 󱃾 default 17:17:49
❯ kubectl -n kube-system logs daemonsets/rke2-canal -c calico-node | grep -i eth0
Found 3 pods, using pod/rke2-canal-fxpxx
2024-05-18 14:05:14.281 [INFO][56] felix/int_dataplane.go 1387: Linux interface state changed. ifIndex=2 ifaceName="eth0" state="up"
2024-05-18 14:05:14.282 [INFO][56] felix/int_dataplane.go 1431: Linux interface addrs changed. addrs=set.Set{159.69.x.x,ipv6} ifaceName="eth0"
2024-05-18 14:05:14.282 [INFO][56] felix/int_dataplane.go 2011: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"eth0", State:"up", Index:2}
2024-05-18 14:05:14.283 [INFO][56] felix/int_dataplane.go 2031: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"eth0", Addrs:set.Typed[string]{"159.69.x.x":set.v{}, "\<cleaned\>":set.v{}, "fe80::9400:3ff:fe51:b09e":set.v{}}}
2024-05-18 14:05:14.283 [INFO][56] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"eth0", Addrs:set.Typed[string]{"159.69.x.x":set.v{}, "\<cleaned\>":set.v{}, \<cleaned\>":set.v{}}}
2024-05-18 14:07:16.971 [INFO][56] felix/calc_graph.go 507: Local endpoint updated id=WorkloadEndpoint(node=prod-master-1, orchestrator=k8s, workload=kube-system/rke2-coredns-rke2-coredns-84b9cb946c-cfvfh, name=eth0)
2024-05-18 14:07:16.973 [INFO][56] felix/int_dataplane.go 1954: Received *proto.WorkloadEndpointUpdate update from calculation graph msg=id:<orchestrator_id:"k8s" workload_id:"kube-system/rke2-coredns-rke2-coredns-84b9cb946c-cfvfh" endpoint_id:"eth0" > endpoint:<state:"active" name:"cali7394c0c7203" profile_ids:"kns.kube-system" profile_ids:"ksa.kube-system.coredns" ipv4_nets:"10.42.0.2/32" >
2024-05-18 14:07:16.973 [INFO][56] felix/endpoint_mgr.go 602: Updating per-endpoint chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/rke2-coredns-rke2-coredns-84b9cb946c-cfvfh", EndpointId:"eth0"}
2024-05-18 14:07:16.973 [INFO][56] felix/endpoint_mgr.go 648: Updating endpoint routes. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/rke2-coredns-rke2-coredns-84b9cb946c-cfvfh", EndpointId:"eth0"}

Also, server ip routes:

root@prod-master-1:~# ip route
default via 172.31.1.1 dev eth0 proto dhcp src 159.69.x.x metric 100
10.1.0.0/16 via 10.1.0.1 dev enp7s0 proto dhcp src 10.1.0.15 metric 1003 mtu 1450
10.1.0.1 dev enp7s0 proto dhcp scope link src 10.1.0.15 metric 1003 mtu 1450
10.42.0.2 dev cali7394c0c7203 scope link
10.42.0.5 dev cali46f7c8177d1 scope link
10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
172.31.1.1 dev eth0 proto dhcp scope link src 159.69.x.x metric 100
185.12.64.1 via 172.31.1.1 dev eth0 proto dhcp src 159.69.x.x metric 100
185.12.64.2 via 172.31.1.1 dev eth0 proto dhcp src 159.69.x.x metric 100

Expected behavior: I'm using rke2-canal csi on new and empty cluster How can I change calico to use enp7s0 ifaces on cali? Thx

brandond commented 4 months ago

cc @manuelbuil

manuelbuil commented 4 months ago

By looking at calico's code, it seems IP_AUTODETECTION_METHOD is ignored when not using the tigera operator. Canal is not using the operator. However, canal is using the flannel interface for inter-node communication, so it does not really matter what interface calico picks because it will not use it. The Calico part of Canal only takes care of the intra-node communication. You can observe in the routes that you posted:

10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink

As soon as traffic is destined to a pod in another node, it will use the flannel inteface.

I guess we should remove the variable from the chart's values because it can confuse users

github-actions[bot] commented 3 months ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.