Open yakhatape opened 3 weeks ago
looks like my host (master) is trying to contact pod network with the wrong interface (ens192) instead of ens224 when the calico route is created for a pod :
[root@DEV-TEST-FRM-K8S-MASTER01-V ~]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default _gateway 0.0.0.0 UG 100 0 0 ens224
10.252.144.0 0.0.0.0 255.255.255.0 U 101 0 0 ens192
172.80.6.0 0.0.0.0 255.255.255.192 U 0 0 0 *
172.80.6.65 0.0.0.0 255.255.255.255 UH 0 0 0 cali9e6a86ced88
172.80.6.66 0.0.0.0 255.255.255.255 UH 0 0 0 calic37448fcc9c
172.80.6.67 0.0.0.0 255.255.255.255 UH 0 0 0 cali8b786a18714
172.80.6.68 0.0.0.0 255.255.255.255 UH 0 0 0 calibbe2f41fb94
172.80.6.69 0.0.0.0 255.255.255.255 UH 0 0 0 caliced9b65d725
172.80.6.70 0.0.0.0 255.255.255.255 UH 0 0 0 cali3a70c213492
192.168.143.0 0.0.0.0 255.255.255.0 U 100 0 0 ens224
If I try to do an ip route get to a pod with route existing (the host try to contact it with the nic ens192 instead of ens224) :
[root@DEV-TEST-FRM-K8S-MASTER01-V ~]# ip route get 172.80.6.65
172.80.6.65 dev cali9e6a86ced88 src 10.252.144.23 uid 0
cache
If I try to reach an IP not used and without route created by calico the traffic is using the correct nic (ens224) :
[root@DEV-TEST-FRM-K8S-MASTER01-V ~]# ip route get 172.80.6.80
172.80.6.80 via 192.168.143.250 dev ens224 src 192.168.143.53 uid 0
cache
Route on pod side looks good to me if i check on calico-node :
bird show route :
BIRD v0.3.3+birdv1.6.8 ready.
bird> show route
0.0.0.0/0 via 192.168.143.250 on ens224 [kernel1 09:29:09] * (10)
10.252.144.0/24 dev ens192 [direct1 09:29:09] * (240)
172.80.6.66/32 dev calic37448fcc9c [kernel1 09:29:09] * (10)
172.80.6.67/32 dev cali8b786a18714 [kernel1 09:29:09] * (10)
172.80.6.0/26 blackhole [static1 09:29:09] * (200)
172.80.6.0/32 dev tunl0 [direct1 09:29:09] * (240)
172.80.6.65/32 dev cali9e6a86ced88 [kernel1 09:29:09] * (10)
192.168.143.0/24 dev ens224 [direct1 09:29:09] * (240)
172.80.6.70/32 dev cali3a70c213492 [kernel1 09:29:09] * (10)
172.80.6.68/32 dev calibbe2f41fb94 [kernel1 09:29:09] * (10)
172.80.6.69/32 dev caliced9b65d725 [kernel1 09:29:09] * (10)
bird show status :
bird> show status
BIRD v0.3.3+birdv1.6.8
Router ID is 192.168.143.53
Current server time is 2024-10-01 09:36:16
Last reboot on 2024-10-01 09:29:10
Last reconfiguration on 2024-10-01 09:29:10
Daemon is up and running
Any idea about that behavior ?
Context :
We use two nic on our server, one (ens192) for the admin traffic [ssh] and an other one for production (ens224) (apps exposition, monitoring data, dns, etc.. etc..). Iptables rules are already in place to autorize specific ip to reach the ens192 over SSH, and autorize specific traffic on ens224 (for exemple output traffic like : dns, ntp, ldap, apps, etc..).
We have some specific dynamic routing on server (to avoid asymetric routing) : all traffic coming through ens192 should goback to ens192 and same for ens224. For that we have in place 2 policy rules based on table lookup: 500 => ens224
600 => ens192
The main default route is over ens224 :
Ip rules list :
Expected Behavior
Calico should respect routing over ens224
Current Behavior
Calico create following route :
blackhole 172.80.6.0/26 proto 80 172.80.6.1 dev calib670fd5cc65 scope link 172.80.6.2 dev caliba0d39000ce scope link 172.80.6.3 dev calibbe91b2e23c scope link 172.80.6.4 dev caliba9ebb5cc2a scope link
Once calico route are created route looks like broken which make routing to ens192 (10.252.X):
Calico are installed once the cluster is init following step : 1) kubeadm init --apiserver-advertise-address=192.168.143.53 --pod-network-cidr=172.80.0.0/21 --control-plane-endpoint=192.168.143.53:6443
2) kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/tigera-operator.yaml
3) kubectl apply -f custom-resources.yaml
custom-resources.yaml contain :
I was thinking about an issue of the IPv4 detection its why i've added the "interface: ens224" few minutes ago .. but nothing change.
All pods with ip 172.80.6.X are not in READY status :
And I can see on kernel logs some martian warning when pods trying to communicate to master ip node :
Possible Solution
I don't know
Your Environment
Someone can help me to debug this behavior ? I'm staying at disposal for any questions or other