projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.02k stars 1.34k forks source link

service updates are not propagated via etcd/typha to calico-node #9229

Closed iksoon-park closed 1 month ago

iksoon-park commented 1 month ago

Deploy a Pod with the following options in a Calico cluster:

Calico Setting

Pod spec

Run the "nslookup kubernetes" command on that pod

It has been confirmed that there is an issue with domain queries not being made using coreDNS.

This is the pod yaml used for testing.

apiVersion: v1
kind: Pod
metadata:
  name: netshoot
  labels:
    app: netshoot
spec:
  containers:
  - name: netshoot
    image: nicolaka/netshoot
    imagePullPolicy: IfNotPresent
    command: ["/bin/sleep"]
    args: ["3650d"]
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet
  restartPolicy: Always

After deploying the pod, I ran the following command.

kubectl exec -it netshoot -- bash
nslookup kubernetes

The execution results are as follows. image

Expected Behavior

I want the command to execute success. image

Current Behavior

The test environment is as follows:

  1. pod deployment info image

  2. worker node info image

  3. "worker node" nat tables kubectl exec -it calico-node-76h79 -n kube-system -- calico-node -bpf nat dump --log-level debug image

  4. "worker node" routing table image

  5. tcpdump log image

Let's check the packet below.

22:27:24.869306 calicc879659490 Out IP 192.168.0.127.58218 > 10.100.51.12.53: 902+ A? kubernetes.default.svc.cluster.local. (54)
22:27:24.870113 calicc879659490 In  IP 10.100.51.12.53 > 192.168.0.127.58218: 902*- 1/0/0 A 10.254.0.1 (106)

The problem situation can be expressed in a diagram as follows: image

Let's check the BPF log for all sections.

1. Sending packets from host to coreDNS

Changed by NAT as follows:

It is OK

isc-net-0000-1685825 bpf_trace_printk: eth0------------E: New packet at ifindex=2; mark=0
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: IP id=64500
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: IP s=c0a8007f d=afe000a
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: IP ihl=20 bytes
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: UDP; ports: s=58218 d=53
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT: lookup from c0a8007f:58218
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT: lookup to   afe000a:53
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT: Miss.
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT: result: NEW.
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: conntrack entry flags 0x0
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: NAT: 1st level lookup addr=afe000a port=53 udp
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: NAT: 1st level hit; id=2
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: NAT: 1st level hit; id=2 ordinal=1
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: NAT: backend selected a64330c:53
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Socket cookie: 25ca3
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Source IP is local host.
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Outbound failsafe port: 53. Skip policy.
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: jump to idx 2 prog at 22
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Entering calico_tc_skb_accepted_entrypoint
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Entering calico_tc_skb_accepted
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: src=c0a8007f dst=afe000a
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: post_nat=a64330c:53
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: tun_ip=0
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: pol_rc=1
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: sport=58218
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: dport=53
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: flags=20
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: ct_rc=0
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: ct_related=0
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: mark=0x1000000
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: ip->ttl 64
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: jump to idx 7 prog at 27
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Entering calico_tc_skb_new_flow
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Allowed by policy: ACCEPT
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT: DNAT to a64330c:53
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT-ALL packet mark is: 0x0
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT-ALL Creating tracking entry type 2 at 1446810034128900.
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT-ALL tracking entry flags 0x0
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT-ALL SNAT orig c0a8007f:58218
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: NEW src_to_dst->ifindex 0
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT-ALL approved both due to host source port conflict resolution.
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: CT-17 Creating FWD entry at 1446810034136591.
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: FWD c0a8007f -> afe000a
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Fixing UDP source port from 58218 to 58218
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: DNAT L3 csum at 24 L4 csum at 40
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: L4 checksum update dst IP from afe000a to a64330c
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: bpf_l4_csum_diff(IP): 0x2326600
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: NP local WL a64330c:53 on HEP
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: FIB family=2
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: FIB tot_len=0
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: FIB ifindex=2
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: FIB l4_protocol=17
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: FIB sport=58218
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: FIB dport=53
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: FIB ipv4_src=c0a8007f
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: FIB ipv4_dst=a64330c
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Traffic is towards the host namespace, doing Linux FIB lookup
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: FIB lookup succeeded - with neigh
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Got Linux FIB hit, redirecting to iface 18.
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Setting mark to 0x1000000
isc-net-0000-1685825 bpf_trace_printk: eth0------------E: Final result=ALLOW (0). Program execution time: 59942ns

2. The packet arrived at the coreDNS pod

The coreDNS pod received the packet. it's OK

isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: New packet at ifindex=18; mark=1000000
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: IP id=64500
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: IP s=c0a8007f d=a64330c
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: IP ihl=20 bytes
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: UDP; ports: s=58218 d=53
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: CT: lookup from c0a8007f:58218
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: CT: lookup to   a64330c:53
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: CT: tun_ip:0
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: CT: Hit! NAT REV entry but not connection opener: ESTABLISHED.
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: CT: result: 0x2002
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: conntrack entry flags 0x100
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: CT Hit
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: jump to idx 2 prog at 91
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: Entering calico_tc_skb_accepted_entrypoint
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: Entering calico_tc_skb_accepted
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: src=c0a8007f dst=a64330c
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: post_nat=0:0
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: tun_ip=0
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: pol_rc=1
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: sport=58218
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: dport=53
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: flags=20
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: ct_rc=2
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: ct_related=0
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: mark=0x1000000
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: ip->ttl 63
isc-net-0000-1685825  bpf_trace_printk: calicc879659490-I: Final result=ALLOW (0). Program execution time: 12413

3. coreDNS sends a response. It forwards the packet to host.

Changed by NAT as follows:

In my opinion, This is also okay...

coredns-1678826  bpf_trace_printk: calicc879659490-E: New packet at ifindex=18; mark=0
coredns-1678826  bpf_trace_printk: calicc879659490-E: IP id=62779
coredns-1678826  bpf_trace_printk: calicc879659490-E: IP s=a64330c d=c0a8007f
coredns-1678826  bpf_trace_printk: calicc879659490-E: IP ihl=20 bytes
coredns-1678826  bpf_trace_printk: calicc879659490-E: UDP; ports: s=53 d=58218
coredns-1678826  bpf_trace_printk: calicc879659490-E: CT: lookup from a64330c:53
coredns-1678826  bpf_trace_printk: calicc879659490-E: CT: lookup to   c0a8007f:58218
coredns-1678826  bpf_trace_printk: calicc879659490-E: CT: tun_ip:0
coredns-1678826  bpf_trace_printk: calicc879659490-E: CT: Hit! NAT REV entry but not connection opener: ESTABLISHED.
coredns-1678826  bpf_trace_printk: calicc879659490-E: CT: First response packet? ifindex=18
coredns-1678826  bpf_trace_printk: calicc879659490-E: Host RPF check src=a64330c skb loose if 18
coredns-1678826  bpf_trace_printk: calicc879659490-E: Host RPF check src=a64330c skb iface=18
coredns-1678826  bpf_trace_printk: calicc879659490-E: Host RPF check rc 0 result 1
coredns-1678826  bpf_trace_printk: calicc879659490-E: CT: result: 0x2002
coredns-1678826  bpf_trace_printk: calicc879659490-E: conntrack entry flags 0x100
coredns-1678826  bpf_trace_printk: calicc879659490-E: CT Hit
coredns-1678826  bpf_trace_printk: calicc879659490-E: jump to idx 2 prog at 79
coredns-1678826  bpf_trace_printk: calicc879659490-E: Entering calico_tc_skb_accepted_entrypoint
coredns-1678826  bpf_trace_printk: calicc879659490-E: Entering calico_tc_skb_accepted
coredns-1678826  bpf_trace_printk: calicc879659490-E: src=a64330c dst=c0a8007f
coredns-1678826  bpf_trace_printk: calicc879659490-E: post_nat=0:0
coredns-1678826  bpf_trace_printk: calicc879659490-E: tun_ip=0
coredns-1678826  bpf_trace_printk: calicc879659490-E: pol_rc=1
coredns-1678826  bpf_trace_printk: calicc879659490-E: sport=53
coredns-1678826  bpf_trace_printk: calicc879659490-E: dport=58218
coredns-1678826  bpf_trace_printk: calicc879659490-E: flags=20
coredns-1678826  bpf_trace_printk: calicc879659490-E: ct_rc=2
coredns-1678826  bpf_trace_printk: calicc879659490-E: ct_related=0
coredns-1678826  bpf_trace_printk: calicc879659490-E: mark=0x1000000
coredns-1678826  bpf_trace_printk: calicc879659490-E: ip->ttl 64
coredns-1678826  bpf_trace_printk: calicc879659490-E: FIB family=2
coredns-1678826  bpf_trace_printk: calicc879659490-E: FIB tot_len=0
coredns-1678826  bpf_trace_printk: calicc879659490-E: FIB ifindex=18
coredns-1678826  bpf_trace_printk: calicc879659490-E: FIB l4_protocol=17
coredns-1678826  bpf_trace_printk: calicc879659490-E: FIB sport=53
coredns-1678826  bpf_trace_printk: calicc879659490-E: FIB dport=58218
coredns-1678826  bpf_trace_printk: calicc879659490-E: FIB ipv4_src=a64330c
coredns-1678826  bpf_trace_printk: calicc879659490-E: FIB ipv4_dst=c0a8007f
coredns-1678826  bpf_trace_printk: calicc879659490-E: Traffic is towards the host namespace, doing Linux FIB lookup
coredns-1678826  bpf_trace_printk: calicc879659490-E: FIB lookup failed (FIB problem): 4.
coredns-1678826  bpf_trace_printk: calicc879659490-E: Traffic is towards host namespace, marking with 0x1000000.
coredns-1678826  bpf_trace_printk: calicc879659490-E: Final result=ALLOW (0). Program execution time: 39861ns 

4. This is an error section, There is no log confirming arrival to eth0.

An ICMP log is confirmed out of nowhere....????????

coredns-1678826  bpf_trace_printk: calicc879659490-I: New packet at ifindex=18; mark=8000000
coredns-1678826  bpf_trace_printk: calicc879659490-I: IP id=18895
coredns-1678826  bpf_trace_printk: calicc879659490-I: IP s=c0a8007f d=a64330c
coredns-1678826  bpf_trace_printk: calicc879659490-I: IP ihl=20 bytes
coredns-1678826  bpf_trace_printk: calicc879659490-I: ICMP; type=3 code=3
coredns-1678826  bpf_trace_printk: calicc879659490-I: CT: lookup from c0a8007f:0
coredns-1678826  bpf_trace_printk: calicc879659490-I: CT: lookup to   a64330c:771
coredns-1678826  bpf_trace_printk: calicc879659490-I: CT-ICMP: proto 17
coredns-1678826  bpf_trace_printk: calicc879659490-I: CT: related lookup from a64330c:53
coredns-1678826  bpf_trace_printk: calicc879659490-I: CT: related lookup to   c0a8007f:58218
coredns-1678826  bpf_trace_printk: calicc879659490-I: CT: tun_ip:0
coredns-1678826  bpf_trace_printk: calicc879659490-I: CT: result: 0x4
coredns-1678826  bpf_trace_printk: calicc879659490-I: CT: result: related
coredns-1678826  bpf_trace_printk: calicc879659490-I: conntrack entry flags 0x100
coredns-1678826  bpf_trace_printk: calicc879659490-I: CT Hit
coredns-1678826  bpf_trace_printk: calicc879659490-I: jump to idx 2 prog at 91
coredns-1678826  bpf_trace_printk: calicc879659490-I: Entering calico_tc_skb_accepted_entrypoint
coredns-1678826  bpf_trace_printk: calicc879659490-I: Entering calico_tc_skb_accepted
coredns-1678826  bpf_trace_printk: calicc879659490-I: src=c0a8007f dst=a64330c
coredns-1678826  bpf_trace_printk: calicc879659490-I: post_nat=0:0
coredns-1678826  bpf_trace_printk: calicc879659490-I: tun_ip=0
coredns-1678826  bpf_trace_printk: calicc879659490-I: pol_rc=1
coredns-1678826  bpf_trace_printk: calicc879659490-I: sport=53
coredns-1678826  bpf_trace_printk: calicc879659490-I: dport=58218
coredns-1678826  bpf_trace_printk: calicc879659490-I: flags=20
coredns-1678826  bpf_trace_printk: calicc879659490-I: ct_rc=4
coredns-1678826  bpf_trace_printk: calicc879659490-I: ct_related=1
coredns-1678826  bpf_trace_printk: calicc879659490-I: mark=0x1000000
coredns-1678826  bpf_trace_printk: calicc879659490-I: ip->ttl 64
coredns-1678826  bpf_trace_printk: calicc879659490-I: jump to idx 6 prog at 94
coredns-1678826  bpf_trace_printk: calicc879659490-I: Entering calico_tc_skb_icmp_inner_nat
coredns-1678826  bpf_trace_printk: calicc879659490-I: IP id=18895
coredns-1678826  bpf_trace_printk: calicc879659490-I: IP s=c0a8007f d=a64330c
coredns-1678826  bpf_trace_printk: calicc879659490-I: IP ihl=20 bytes
coredns-1678826  bpf_trace_printk: calicc879659490-I: CT: DNAT to afe000a:53
coredns-1678826  bpf_trace_printk: calicc879659490-I: Fixing UDP source port from 53 to 58218
coredns-1678826  bpf_trace_printk: calicc879659490-I: DNAT L3 csum at 52 L4 csum at 0
coredns-1678826  bpf_trace_printk: calicc879659490-I: IP id=18895
coredns-1678826  bpf_trace_printk: calicc879659490-I: IP s=c0a8007f d=a64330c
coredns-1678826  bpf_trace_printk: calicc879659490-I: IP ihl=20 bytes
coredns-1678826  bpf_trace_printk: calicc879659490-I: Final result=ALLOW (0). Program execution time: 28593ns
containerd-8707  bpf_trace_printk: lo--------------E: New packet at ifindex=1; mark=8000000
containerd-8707  bpf_trace_printk: lo--------------E: IP id=41096
containerd-8707  bpf_trace_printk: lo--------------E: IP s=7f000001 d=7f000001
containerd-8707  bpf_trace_printk: lo--------------E: IP ihl=20 bytes
containerd-8707  bpf_trace_printk: lo--------------E: Allowing because it is not UDP
containerd-8707  bpf_trace_printk: lo--------------E: Final result=ALLOW (0). Program execution time: 3993ns
containerd-8707  bpf_trace_printk: lo--------------E: New packet at ifindex=1; mark=8000000
containerd-8707  bpf_trace_printk: lo--------------E: IP id=21420
containerd-8707  bpf_trace_printk: lo--------------E: IP s=7f000001 d=7f000001
containerd-8707  bpf_trace_printk: lo--------------E: IP ihl=20 bytes
containerd-8707  bpf_trace_printk: lo--------------E: Allowing because it is not UDP
containerd-8707  bpf_trace_printk: lo--------------E: Final result=ALLOW (0). Program execution time: 2769ns
containerd-8707  bpf_trace_printk: lo--------------E: New packet at ifindex=1; mark=8000000
containerd-8707  bpf_trace_printk: lo--------------E: IP id=41097
containerd-8707  bpf_trace_printk: lo--------------E: IP s=7f000001 d=7f000001
containerd-8707  bpf_trace_printk: lo--------------E: IP ihl=20 bytes
containerd-8707  bpf_trace_printk: lo--------------E: Allowing because it is not UDP
containerd-8707  bpf_trace_printk: lo--------------E: Final result=ALLOW (0). Program execution time: 2474ns
containerd-8707  bpf_trace_printk: lo--------------E: New packet at ifindex=1; mark=8000000
containerd-8707  bpf_trace_printk: lo--------------E: IP id=21421
containerd-8707  bpf_trace_printk: lo--------------E: IP s=7f000001 d=7f000001
containerd-8707  bpf_trace_printk: lo--------------E: IP ihl=20 bytes
containerd-8707  bpf_trace_printk: lo--------------E: Allowing because it is not UDP
containerd-8707  bpf_trace_printk: lo--------------E: Final result=ALLOW (0). Program execution time: 2720n

Possible Solution

There is no problem with calico v3.24.1. Just change the version to "v3.24.1" in the same cluster, it works fine.

Steps to Reproduce (for bugs)

  1. Deploy Pod
    apiVersion: v1
    kind: Pod
    metadata:
    name: netshoot
    labels:
    app: netshoot
    spec:
    containers:
    - name: netshoot
    image: nicolaka/netshoot
    imagePullPolicy: IfNotPresent
    command: ["/bin/sleep"]
    args: ["3650d"]
    hostNetwork: true
    dnsPolicy: ClusterFirstWithHostNet
    restartPolicy: Always
  2. kubectl exec -it netshoot -- bash
  3. nslookup kubernetes

Context

k8s cluster networking error

Your Environment

tomastigera commented 1 month ago

In your route -n output I would expect to see 10.254.0.10 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali Without that the rest cannot quite work and the packets are bounced around. Could you share calico logs on that node? I should show that it is trying to program this route for the service and that it fails somehow. I think it might be related to this fix https://github.com/projectcalico/calico/pull/8983 You are likely to see the issue with any UDP service from a host networked pod.

tomastigera commented 1 month ago

CalicoCon 2024 - Calico eBPF (1)

Without that route, the packet is routed to the default route (eth0) where it happens to be NATed and turned around and thus reaches the local service pod, but on the way back it does not get NATed back (because it does not follow an expected path) and then lands on the host, but there is no socket that would take it so it generates the ICMP type 3 code 3 response (dest unreachable, no such port)

iksoon-park commented 1 month ago

@tomastigera Thanks for the reply. But I think it's different from issue https://github.com/projectcalico/calico/pull/8983.

The meaning of each IP in the test in the main text is as follows:

As written in the main text, the routing table of the worker node where the problem occurred is as follows.

image

I checked the routing table, but the settings below that you mentioned are not confirmed.

As written in the BPF log in the main text, when the initial request was started, the "10.254.0.10" service ClusterIP was normally changed to "10.100.5.127" due to BPF's NAT.

1. Sending packets from host to coreDNS

Changed by NAT as follows:

Afterwards, the response is handled by the CoreDNS pod IP "10.100.5.127" instead of the IP "10.254.0.10".

Please check the status of the BPF NAT table and the Routing table.

image

I checked the tcpdump on the bpfin.cali and bpfout.cali NICs, and as expected, no packets were observed since there are no rules in the routing table. The following figure shows this.

image

I am also providing the calico-node pod logs from the cluster where the issue occurred. calico-node.log

This issue is resolved and works correctly when switching to Calico v3.24.1 in the same cluster. The issue has been identified in versions 3.27 to 3.28. Could you please check again?

sridhartigera commented 1 month ago

@iksoon-park The route not present is the issue. As mentioned above, though the packet gets NATed to the correct pod IP, the return path is unexpected and results in failure. Need to find out why the routes are missing.

Do you see the below logs in calico-node? Remove old route dest=10.254.0.10 ifaceName="bpfin.cali" ifaceRegex="bpfin.cali". These logs might appear after some time.

Did you try with 3.28.1?

iksoon-park commented 1 month ago

@sridhartigera , @tomastigera

I understand. But.. After some time, the following log is not observed in calico-node: Remove old route dest=10.254.0.10 ifaceName="bpfin.cali" ifaceRegex="bpfin.cali"

The same issue occurs in Calico v3.28.1. I have also checked all versions of 3.27, and the same problem occurs.

"10.254.0.10" corresponds to the ClusterIP of Kubernetes. As far as I know, the rule for this IP is not created in the routing table but is translated into the Pod IP by the NAT in the BPF Map.

Thus, I believe this issue is caused by Calico eBPF. Is there anything I might be misunderstanding?

sridhartigera commented 1 month ago

@iksoon-park Please provide your felixconfiguration and calico-node debug logs. set logSeverityScreen to Debug in the felix configuration.

iksoon-park commented 1 month ago

@sridhartigera

As mentioned in the text, the Calico in the test environment is as follows. Calico Setting

My FelixConfiguration settings are as follows. The environment variables are set as below:

NODENAME=${node_name}
CALICO_NETWORKING_BACKEND=bird
IP_AUTODETECTION_METHOD=interface=eth0
IP=autodetect
ETCD_ENDPOINTS=${etcd_endpoint}
ETCD_CERT_FILE=${file_path_cert.crt}
ETCD_CA_CERT_FILE=${file_path_ca_cert.crt}
ETCD_KEY_FILE=${file_path_key.pem}
ETCD_DISCOVERY_SRV=
CALICO_MANAGE_CNI=false
DATASTORE_TYPE=etcdv3
CALICO_IPV4POOL_IPIP=Never
CALICO_IPV4POOL_VXLAN=Never
CALICO_IPV6POOL_VXLAN=Never
FELIX_IPINIPMTU=0
FELIX_VXLANMTU=0
FELIX_WIREGUARDMTU=0
CALICO_IPV4POOL_CIDR=10.100.0.0/16
CALICO_IPV4POOL_BLOCK_SIZE=24
CALICO_DISABLE_FILE_LOGGING=true
FELIX_DEFAULTENDPOINTTOHOSTACTION=ACCEPT
FELIX_IPV6SUPPORT=false
FELIX_HEALTHENABLED=true
FELIX_BPFENABLED=true
FELIX_BPFDISABLEUNPRIVILEGED=true
FELIX_BPFKUBEPROXYIPTABLESCLEANUPENABLED=true
FELIX_BPFKUBEPROXYENDPOINTSLICESENABLED=true
FELIX_BPFDATAIFACEPATTERN=eth0
FELIX_XDPENABLED=true
FELIX_BPFLOGLEVEL=Debug

The configuration confirmed with calicoctl is as follows.

[calicoctl get felixconfiguration]
NAME
default
node.iksoon-27-default-worker-node-0
node.iksoon-27-master-0
node.iksoon-27-master-1
node.iksoon-27-master-2

[calicoctl get felixconfiguration default  -o yaml]
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
  creationTimestamp: "2024-08-27T04:30:42Z"
  name: default
  resourceVersion: "240"
  uid: 329eeeb3-5703-4e1e-a14e-937b8f6da84a
spec:
  bpfConnectTimeLoadBalancing: TCP
  bpfHostNetworkedNATWithoutCTLB: Enabled
  bpfLogLevel: ""
  floatingIPs: Disabled
  logSeverityScreen: Info
  reportingInterval: 0s

[calicoctl get felixconfiguration node.iksoon-27-master-0 -o yaml]
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
  creationTimestamp: "2024-08-27T04:30:42Z"
  name: node.iksoon-27-master-0
  resourceVersion: "241"
  uid: 01a765c1-6237-44dd-a7c1-eab0b28da8dd
spec:
  bpfConnectTimeLoadBalancing: TCP
  bpfHostNetworkedNATWithoutCTLB: Enabled
  bpfLogLevel: ""
  defaultEndpointToHostAction: Return
  floatingIPs: Disabled

[calicoctl get felixconfiguration node.iksoon-27-default-worker-node-0 -oyaml]
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
  creationTimestamp: "2024-08-27T04:36:38Z"
  name: node.iksoon-27-default-worker-node-0
  resourceVersion: "1367"
  uid: d208f176-cf76-4ad5-a1c5-3792d44c28cd
spec:
  bpfConnectTimeLoadBalancing: TCP
  bpfHostNetworkedNATWithoutCTLB: Enabled
  bpfLogLevel: ""
  defaultEndpointToHostAction: Return
  floatingIPs: Disabled

The log content related to 10.254.0.10 is as follows. As I expected, it is handled by BPF NAT and the routing table is not configured.

2024-09-17 14:48:42.993 [DEBUG][67] felix/syncer.go 1046: resolved NATKey{Proto:6 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0} as kube-system/kube-dns:dns-tcp
2024-09-17 14:48:42.993 [DEBUG][67] felix/syncer.go 1046: resolved NATKey{Proto:17 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0} as kube-system/kube-dns:dns
2024-09-17 14:48:42.993 [DEBUG][67] felix/syncer.go 1046: resolved NATKey{Proto:6 Addr:10.254.0.10 Port:9153 SrcAddr:0.0.0.0/0} as kube-system/kube-dns:metrics

2024-09-17 14:48:42.994 [DEBUG][67] felix/syncer.go 585: Applying new state, {map[default/iksoon-nginx-service:10.254.152.135:8081/TCP default/kubernetes:https:10.254.0.1:443/TCP kube-system/calico-typha:calico-typha:10.254.127.49:5473/TCP kube-system/csi-cinder-controller-service:dummy:10.254.200.217:12345/TCP kube-system/kube-dns:dns:10.254.0.10:53/UDP kube-system/kube-dns:dns-tcp:10.254.0.10:53/TCP kube-system/kube-dns:metrics:10.254.0.10:9153/TCP kube-system/metrics-server:10.254.52.186:443/TCP] map[default/iksoon-nginx-service:[10.100.51.8:80 10.100.51.9:80] default/kubernetes:https:[192.168.0.106:6443 192.168.0.130:6443 192.168.0.28:6443] kube-system/calico-typha:calico-typha:[192.168.0.127:5473] kube-system/csi-cinder-controller-service:dummy:[10.100.51.2:12345] kube-system/kube-dns:dns:[10.100.51.0:53 10.100.51.12:53] kube-system/kube-dns:dns-tcp:[10.100.51.0:53 10.100.51.12:53] kube-system/kube-dns:metrics:[10.100.51.0:9153 10.100.51.12:9153] kube-system/metrics-server:[10.100.51.3:443]] kr-pub-a}

2024-09-17 14:48:42.994 [DEBUG][67] felix/syncer.go 942: bpf map writing NATKey{Proto:17 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0}:NATValue{ID:2,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}
2024-09-17 14:48:42.994 [DEBUG][67] felix/delta_tracker.go 125: Set bpfMap="cali_v4_nat_fe3" k=NATKey{Proto:17 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0} v=NATValue{ID:2,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}

2024-09-17 14:48:42.994 [DEBUG][67] felix/syncer.go 942: bpf map writing NATKey{Proto:6 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0}:NATValue{ID:3,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}
2024-09-17 14:48:42.994 [DEBUG][67] felix/delta_tracker.go 125: Set bpfMap="cali_v4_nat_fe3" k=NATKey{Proto:6 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0} v=NATValue{ID:3,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}

2024-09-17 14:48:42.995 [DEBUG][67] felix/syncer.go 942: bpf map writing NATKey{Proto:6 Addr:10.254.0.10 Port:9153 SrcAddr:0.0.0.0/0}:NATValue{ID:4,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}
2024-09-17 14:48:42.995 [DEBUG][67] felix/delta_tracker.go 125: Set bpfMap="cali_v4_nat_fe3" k=NATKey{Proto:6 Addr:10.254.0.10 Port:9153 SrcAddr:0.0.0.0/0} v=NATValue{ID:4,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}

I am also providing the calico-node log file with the logSeverityScreen setting set to Debug. calico-node-log.tar.gz

Please check and confirm.

tomastigera commented 1 month ago

The logs show that calico-node does not get any update about the service so we never program the route. kube-proxy does, but that gets it directly from kubernetes apiserver. But you are using typha and etcd. Seems like typha only sends updates about endpoints and not about services. Yo would have to figure out if etcd has the services.

Could you describe your exact setup? What k8s platform do you use (version) - you mentioned a few.

Note that not having service information propagated to calico-node may have effect not just on networking, but on policy as well.

matthewdupre commented 1 month ago

I recommend using KDD mode: it's the mainline path and receives far more testing attention. etcd mode has been generally unnecessary for many years: Typha exists to mitigate the k8s API Server bottlenecks.

The install docs also advise against it - see https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises for example. I'm not sure if Typha+etcd is even documented anywhere at all.

If you're able to switch to KDD that would be my suggestion - if not, could you please describe why you need to use etcd mode? Thanks

tomastigera commented 1 month ago

Turns out that this specific feature does not work with etcd. If you need to use etcd, set bpfConnectTimeLoadBalancing=Enabled and bpfHostNetworkedNATWithoutCTLB=Disabled. However, you may experience some connectivity issues with DNS if a DNS backed that is actively used dies/migrates to a different node, see https://github.com/projectcalico/calico/issues/4509

iksoon-park commented 1 month ago

@tomastigera Thank you for your response. I have confirmed that everything is working properly with the settings you suggested.

I am currently testing one potential issue, but even when I redeploy or move CoreDNS to a different node, no problems occur.

Here is my test case: deploy test server pod

apiVersion: apps/v1
kind: Deployment
metadata:
  name: iksoon-deployment-nginx
  labels:
    app: iksoon-nginx-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: iksoon-pod-nginx
  template:
    metadata:
      labels:
        app: iksoon-pod-nginx
    spec:
      containers:
      - name: iksoon-nginx
        image: nginx:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: iksoon-nginx-service
spec:
  ports:
  - port: 8081
    targetPort: 80
  selector:
    app: iksoon-pod-nginx

deploy test client pod

apiVersion: v1
kind: Pod
metadata:
  name: netshoot
  labels:
    app: netshoot
spec:
  containers:
  - name: netshoot
    image: nicolaka/netshoot
    imagePullPolicy: IfNotPresent
    command: ["/bin/sleep"]
    args: ["3650d"]
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet
  restartPolicy: Always

I attempted the following steps:

  1. Requests every 0.5 seconds after connecting to netshoot

    kubectl exec -it netshoot -- bash
    while true ; do curl -s -o /dev/null -w "%{http_code}\n" iksoon-nginx-service:8081/ ; sleep 0.5 ; done
  2. coreDNS rollout

    kubectl rollout restart deployment/coredns -n kube-system    (move CoreDNS to a different node)

    result : OK (200 return)

  3. test server pod rollout

    kubectl rollout restart deployment.apps/iksoon-deployment-nginx

    result : OK (200 return)

I’ve tried multiple times, but the results are always normal. Has the issue with https://github.com/projectcalico/calico/issues/4509 been resolved? Or did I perform the test incorrectly?

How can I reproduce the issue you mentioned earlier?

tomastigera commented 1 month ago

It depends on how the application uses the DNS. If the application uses connect() with udp, one backend is picked up for the live of the socket. If the app never creates a new socket after it stopped getting response to pick a new backend, it gets stuck.

In most cases it is not an issue, but in some situations it is, so our default is to avoid the "some". Many deployments used to live without that happily before it got reported. You may be the lucky one.