Closed renan closed 3 years ago
@tomastigera you seem to have been involved in the linked Slack conversation, so assigning to you :)
Looks like we're having a similar issue or at least it has exactly the same symptoms. We're on Calico 3.16.5 with eBPF and DSR mode on in production yet, but 3.17.2 in dev/qa shows the same symptoms.
We're seeing odd connection resets, where TCP RST would be issued by a pod shortly after TCP connection is established to the pod (via service IP from outside the cluster). There's always a specific pattern happening prior to this, though: when a connection is being established from a load balancer (sitting outside the cluster) to a service IP in k8s, the TCP SYN/ACK from a pod is lost and the load balancer is re-transmitting its original TCP SYN, it goes to another pod, and that pod acks it and sends SYN/ACK back, and then there's an ACK (sometimes followed by payload too) from the load balancer to complete the threeway handshake, and the pod issues a TCP RST right away. And it happens to many services regardless of their language/framework/webserver, so we believe it's something on the lower level, where calico is managing it. I want to reiterate - that pattern always happens when connections get reset unexpectedly.
I have pcaps to support the case. Anything I can help with to troubleshoot and fix this?
@renan are you using DSR mode too (or @emptywee does it also happen with DSR disabled?)
@fasaxc I'll disable it in dev/qa and test (it's a bit harder to reproduce in dev/qa due to overall less busier cluster traffic-wise). I am actually thinking to add an iptables rule with slight chance of dropping incoming SYN/ACK from a chosen service IP to speed up the occurrence of this bug. I'll keep you posted!
Currently, each new incoming SYN gets load balanced independently; wonder if we should re-use the same backend to avoid any cross-talk.
Currently, each new incoming SYN gets load balanced independently; wonder if we should re-use the same backend to avoid any cross-talk.
Technically, I don't see why we should, but it should not be the cause for connection resets that's for sure.
What I am saying is, if we do re-use the same backend and helps fixing the issue, I am down for it. But technically, it should work regardless if we re-use or not, right?
@renan are you using DSR mode too (or @emptywee does it also happen with DSR disabled?)
Tried with DSR mode off, same results, same pattern.
My calico configuration:
$ calicoctl version
Client Version: v3.17.2
Git commit: 5eb50600
Cluster Version: v3.17.2
Cluster Type: typha,kdd,k8s,bgp,lvd
$ calicoctl get felixconfiguration default -o yaml
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
creationTimestamp: "2019-04-11T23:49:11Z"
name: default
resourceVersion: "271258664"
uid: 6781e0bc-5cb4-11e9-91a4-005056a41089
spec:
bpfEnabled: true
bpfExternalServiceMode: Tunnel
bpfKubeProxyIptablesCleanupEnabled: true
bpfLogLevel: Debug
ipipEnabled: false
logSeverityScreen: Debug
reportingInterval: 0s
vxlanEnabled: false
Attaching a pcap file:
Context for the pcap file:
Taken from the worker node where the pod is running with -i any
flag to tcpdump.
10.152.53.38
- load balancer IP address (btw, it sits in the same subnet with the k8s workers and master nodes, and it has bird running on it, so there's no firewall in between it and the pods/services it talks to).
10.158.223.131
- pod IP address where reset happened.
and here's relevant eBPF debug logs pertaining to one of the resets we are seeing:
<idle>-0 [001] ..s. 160412.041002: 0: ens192---I: IP id=4344 s=a98351f d=a9835c6
<idle>-0 [001] ..s. 160412.041027: 0: ens192---I: VXLAN tunnel packet to c635980a (host IP=c635980a)
<idle>-0 [001] ..s. 160412.041033: 0: ens192---I: ARP update for ifindex 2 ip a98351f
<idle>-0 [001] ..s. 160412.041034: 0: ens192---I: vxlan decap
<idle>-0 [001] ..s. 160412.041037: 0: ens192---I: IP id=4344 s=a983526 d=a9e04c1
<idle>-0 [001] ..s. 160412.041038: 0: ens192---I: vxlan decap origin a98351f
<idle>-0 [001] ..s. 160412.041040: 0: ens192---I: TCP; ports: s=46334 d=80
<idle>-0 [001] ..s. 160412.041042: 0: ens192---I: CT-6 lookup from a983526:46334
<idle>-0 [001] ..s. 160412.041043: 0: ens192---I: CT-6 lookup to a9e04c1:80
<idle>-0 [001] ..s. 160412.041044: 0: ens192---I: CT-6 Packet is a SYN, short-circuiting lookup.
<idle>-0 [001] ..s. 160412.041045: 0: ens192---I: CT-6 result: NEW.
<idle>-0 [001] ..s. 160412.041052: 0: ens192---I: NAT: 1st level lookup addr=a9e04c1 port=80 protocol=6.
<idle>-0 [001] ..s. 160412.041054: 0: ens192---I: NAT: 1st level hit; id=26
<idle>-0 [001] ..s. 160412.041056: 0: ens192---I: NAT: 1st level hit; id=26 ordinal=0
<idle>-0 [001] ..s. 160412.041059: 0: ens192---I: NAT: backend selected a9edf83:8080
<idle>-0 [001] ..s. 160412.041062: 0: ens192---I: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041063: 0: ens192---I: src=a983526 dst=a9e04c1
<idle>-0 [001] ..s. 160412.041064: 0: ens192---I: post_nat=a9edf83:8080
<idle>-0 [001] ..s. 160412.041065: 0: ens192---I: tun_ip=1f35980a
<idle>-0 [001] ..s. 160412.041066: 0: ens192---I: pol_rc=1
<idle>-0 [001] ..s. 160412.041066: 0: ens192---I: sport=46334
<idle>-0 [001] ..s. 160412.041067: 0: ens192---I: flags=0
<idle>-0 [001] ..s. 160412.041068: 0: ens192---I: ct_rc=0
<idle>-0 [001] ..s. 160412.041069: 0: ens192---I: ct_related=0
<idle>-0 [001] ..s. 160412.041070: 0: ens192---I: ip->ttl 63
<idle>-0 [001] ..s. 160412.041072: 0: ens192---I: Allowed by normal policy: ACCEPT
<idle>-0 [001] ..s. 160412.041073: 0: ens192---I: CT: DNAT to a9edf83:8080
<idle>-0 [001] ..s. 160412.041077: 0: ens192---I: rt found for 0xa9edf83 local 1
<idle>-0 [001] ..s. 160412.041078: 0: ens192---I: CT-ALL packet mark is: 0x0
<idle>-0 [001] ..s. 160412.041080: 0: ens192---I: CT-ALL Creating tracking entry type 2 at 160410852691302.
<idle>-0 [001] ..s. 160412.041082: 0: ens192---I: CT-ALL tracking entry flags 0x0
<idle>-0 [001] ..s. 160412.041083: 0: ens192---I: CT-ALL nat tunneled from a98351f
<idle>-0 [001] ..s. 160412.041085: 0: ens192---I: NEW src_to_dst->ifindex 2
<idle>-0 [001] ..s. 160412.041086: 0: ens192---I: CT-ALL Whitelisted source side - from HEP tun fwd=0
<idle>-0 [001] ..s. 160412.041091: 0: ens192---I: CT-6 Creating FWD entry at 160410852701762.
<idle>-0 [001] ..s. 160412.041095: 0: ens192---I: L4 checksum update (csum is at 50) IP from a9e04c1 to a9edf83
<idle>-0 [001] ..s. 160412.041097: 0: ens192---I: bpf_l4_csum_replace(IP): 0
<idle>-0 [001] ..s. 160412.041099: 0: ens192---I: L4 checksum update (csum is at 50) port from 80 to 8080
<idle>-0 [001] ..s. 160412.041100: 0: ens192---I: bpf_l4_csum_replace(port): 0
<idle>-0 [001] ..s. 160412.041103: 0: ens192---I: Traffic is towards host namespace, marking with c3400000.
<idle>-0 [001] ..s. 160412.041104: 0: ens192---I: Final result=ALLOW (-1). Program execution time: 106588ns
<idle>-0 [001] ..s. 160412.041165: 0: cali2c7c-I: IP id=4344 s=a983526 d=a9edf83
<idle>-0 [001] ..s. 160412.041167: 0: cali2c7c-I: TCP; ports: s=46334 d=8080
<idle>-0 [001] ..s. 160412.041168: 0: cali2c7c-I: CT-6 lookup from a983526:46334
<idle>-0 [001] ..s. 160412.041170: 0: cali2c7c-I: CT-6 lookup to a9edf83:8080
<idle>-0 [001] ..s. 160412.041171: 0: cali2c7c-I: CT-6 Packet is a SYN, short-circuiting lookup.
<idle>-0 [001] ..s. 160412.041171: 0: cali2c7c-I: CT-6 result: NEW.
<idle>-0 [001] ..s. 160412.041175: 0: cali2c7c-I: About to jump to policy program; lack of further logs means policy dropped the packet...
<idle>-0 [001] ..s. 160412.041177: 0: cali2c7c-I: Entering calico_tc_skb_accepted_entrypoint
<idle>-0 [001] ..s. 160412.041179: 0: cali2c7c-I: IP id=4344 s=a983526 d=a9edf83
<idle>-0 [001] ..s. 160412.041180: 0: cali2c7c-I: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041181: 0: cali2c7c-I: src=a983526 dst=a9edf83
<idle>-0 [001] ..s. 160412.041182: 0: cali2c7c-I: post_nat=a9edf83:8080
<idle>-0 [001] ..s. 160412.041183: 0: cali2c7c-I: tun_ip=0
<idle>-0 [001] ..s. 160412.041183: 0: cali2c7c-I: pol_rc=1
<idle>-0 [001] ..s. 160412.041184: 0: cali2c7c-I: sport=46334
<idle>-0 [001] ..s. 160412.041185: 0: cali2c7c-I: flags=0
<idle>-0 [001] ..s. 160412.041185: 0: cali2c7c-I: ct_rc=0
<idle>-0 [001] ..s. 160412.041186: 0: cali2c7c-I: ct_related=0
<idle>-0 [001] ..s. 160412.041187: 0: cali2c7c-I: ip->ttl 62
<idle>-0 [001] ..s. 160412.041188: 0: cali2c7c-I: Allowed by normal policy: ACCEPT
<idle>-0 [001] ..s. 160412.041190: 0: cali2c7c-I: CT-ALL packet mark is: 0xc3400000
<idle>-0 [001] ..s. 160412.041191: 0: cali2c7c-I: CT-ALL Asked to create entry but packet is marked as from another endpoint, doing lookup
<idle>-0 [001] ..s. 160412.041194: 0: cali2c7c-I: Final result=ALLOW (-1). Program execution time: 30904ns
<idle>-0 [001] ..s. 160412.041254: 0: cali2c7c-E: IP id=0 s=a9edf83 d=a983526
<idle>-0 [001] ..s. 160412.041256: 0: cali2c7c-E: TCP; ports: s=8080 d=46334
<idle>-0 [001] ..s. 160412.041259: 0: cali2c7c-E: CT-6 lookup from a9edf83:8080
<idle>-0 [001] ..s. 160412.041260: 0: cali2c7c-E: CT-6 lookup to a983526:46334
<idle>-0 [001] ..s. 160412.041263: 0: cali2c7c-E: CT-6 tun_ip:a98351f
<idle>-0 [001] ..s. 160412.041265: 0: cali2c7c-E: CT-6 Hit! NAT REV entry at ingress to connection opener: SNAT.
<idle>-0 [001] ..s. 160412.041267: 0: cali2c7c-E: REV src_to_dst->ifindex 9
<idle>-0 [001] ..s. 160412.041268: 0: cali2c7c-E: CT-6 result: 3
<idle>-0 [001] ..s. 160412.041269: 0: cali2c7c-E: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041270: 0: cali2c7c-E: src=a9edf83 dst=a983526
<idle>-0 [001] ..s. 160412.041271: 0: cali2c7c-E: post_nat=0:0
<idle>-0 [001] ..s. 160412.041272: 0: cali2c7c-E: tun_ip=0
<idle>-0 [001] ..s. 160412.041273: 0: cali2c7c-E: pol_rc=0
<idle>-0 [001] ..s. 160412.041273: 0: cali2c7c-E: sport=8080
<idle>-0 [001] ..s. 160412.041274: 0: cali2c7c-E: flags=0
<idle>-0 [001] ..s. 160412.041275: 0: cali2c7c-E: ct_rc=3
<idle>-0 [001] ..s. 160412.041276: 0: cali2c7c-E: ct_related=0
<idle>-0 [001] ..s. 160412.041277: 0: cali2c7c-E: ip->ttl 64
<idle>-0 [001] ..s. 160412.041278: 0: cali2c7c-E: CT: SNAT from a9e04c1:80
<idle>-0 [001] ..s. 160412.041281: 0: cali2c7c-E: L4 checksum update (csum is at 50) IP from a9edf83 to a9e04c1
<idle>-0 [001] ..s. 160412.041282: 0: cali2c7c-E: bpf_l4_csum_replace(IP): 0
<idle>-0 [001] ..s. 160412.041284: 0: cali2c7c-E: L4 checksum update (csum is at 50) port from 8080 to 80
<idle>-0 [001] ..s. 160412.041285: 0: cali2c7c-E: bpf_l4_csum_replace(port): 0
<idle>-0 [001] ..s. 160412.041291: 0: cali2c7c-E: vxlan encap a9edf83 : a98351f
<idle>-0 [001] ..s. 160412.041293: 0: cali2c7c-E: vxlan return 1 ifindex_fwd 2
<idle>-0 [001] ..s. 160412.041296: 0: cali2c7c-E: Redirect directly to interface (2) succeeded.
<idle>-0 [001] ..s. 160412.041297: 0: cali2c7c-E: Traffic is towards host namespace, marking with c3500000.
<idle>-0 [001] ..s. 160412.041299: 0: cali2c7c-E: Final result=ALLOW (7). Program execution time: 46113ns
<idle>-0 [001] ..s. 160412.041303: 0: ens192---E: Packet approved for forward - src ip fixup
<idle>-0 [001] ..s. 160412.041304: 0: ens192---E: IP id=0 s=a9edf83 d=a98351f
<idle>-0 [001] ..s. 160412.041305: 0: ens192---E: src ip fixup a9835c6
<idle>-0 [001] ..s. 160412.041307: 0: ens192---E: Final result=ALLOW (-1). Program execution time: 4025ns
<idle>-0 [001] ..s. 160412.041340: 0: ens192---I: IP id=4345 s=a98351f d=a9835c6
<idle>-0 [001] ..s. 160412.041341: 0: ens192---I: VXLAN tunnel packet to c635980a (host IP=c635980a)
<idle>-0 [001] ..s. 160412.041344: 0: ens192---I: ARP update for ifindex 2 ip a98351f
<idle>-0 [001] ..s. 160412.041345: 0: ens192---I: vxlan decap
<idle>-0 [001] ..s. 160412.041347: 0: ens192---I: IP id=4345 s=a983526 d=a9e04c1
<idle>-0 [001] ..s. 160412.041348: 0: ens192---I: vxlan decap origin a98351f
<idle>-0 [001] ..s. 160412.041349: 0: ens192---I: TCP; ports: s=46334 d=80
<idle>-0 [001] ..s. 160412.041350: 0: ens192---I: CT-6 lookup from a983526:46334
<idle>-0 [001] ..s. 160412.041352: 0: ens192---I: CT-6 lookup to a9e04c1:80
<idle>-0 [001] ..s. 160412.041354: 0: ens192---I: CT-6 Hit! NAT FWD entry, doing secondary lookup.
<idle>-0 [001] ..s. 160412.041356: 0: ens192---I: CT-6 fwd tun_ip:a98351f
<idle>-0 [001] ..s. 160412.041359: 0: ens192---I: CT-6 result: 4
<idle>-0 [001] ..s. 160412.041360: 0: ens192---I: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041361: 0: ens192---I: src=a983526 dst=a9e04c1
<idle>-0 [001] ..s. 160412.041362: 0: ens192---I: post_nat=0:0
<idle>-0 [001] ..s. 160412.041363: 0: ens192---I: tun_ip=1f35980a
<idle>-0 [001] ..s. 160412.041364: 0: ens192---I: pol_rc=0
<idle>-0 [001] ..s. 160412.041364: 0: ens192---I: sport=46334
<idle>-0 [001] ..s. 160412.041365: 0: ens192---I: flags=0
<idle>-0 [001] ..s. 160412.041366: 0: ens192---I: ct_rc=4
<idle>-0 [001] ..s. 160412.041366: 0: ens192---I: ct_related=0
<idle>-0 [001] ..s. 160412.041367: 0: ens192---I: ip->ttl 63
<idle>-0 [001] ..s. 160412.041369: 0: ens192---I: CT: DNAT to a9edf83:8080
<idle>-0 [001] ..s. 160412.041371: 0: ens192---I: L4 checksum update (csum is at 50) IP from a9e04c1 to a9edf83
<idle>-0 [001] ..s. 160412.041373: 0: ens192---I: bpf_l4_csum_replace(IP): 0
<idle>-0 [001] ..s. 160412.041374: 0: ens192---I: L4 checksum update (csum is at 50) port from 80 to 8080
<idle>-0 [001] ..s. 160412.041375: 0: ens192---I: bpf_l4_csum_replace(port): 0
<idle>-0 [001] ..s. 160412.041377: 0: ens192---I: IP id=4345 s=a983526 d=a9edf83
<idle>-0 [001] ..s. 160412.041379: 0: ens192---I: FIB family=2
<idle>-0 [001] ..s. 160412.041379: 0: ens192---I: FIB tot_len=52
<idle>-0 [001] ..s. 160412.041380: 0: ens192---I: FIB ifindex=2
<idle>-0 [001] ..s. 160412.041381: 0: ens192---I: FIB l4_protocol=6
<idle>-0 [001] ..s. 160412.041382: 0: ens192---I: FIB sport=46334
<idle>-0 [001] ..s. 160412.041382: 0: ens192---I: FIB dport=8080
<idle>-0 [001] ..s. 160412.041383: 0: ens192---I: FIB ipv4_src=a983526
<idle>-0 [001] ..s. 160412.041384: 0: ens192---I: FIB ipv4_dst=a9edf83
<idle>-0 [001] ..s. 160412.041385: 0: ens192---I: Traffic is towards the host namespace, doing Linux FIB lookup
<idle>-0 [001] ..s. 160412.041388: 0: ens192---I: FIB lookup succeeded
<idle>-0 [001] ..s. 160412.041390: 0: ens192---I: Got Linux FIB hit, redirecting to iface 9.
<idle>-0 [001] ..s. 160412.041392: 0: ens192---I: Traffic is towards host namespace, marking with c3400000.
<idle>-0 [001] ..s. 160412.041393: 0: ens192---I: Final result=ALLOW (7). Program execution time: 53793ns
<idle>-0 [001] ..s. 160412.041396: 0: cali2c7c-I: IP id=4345 s=a983526 d=a9edf83
<idle>-0 [001] ..s. 160412.041397: 0: cali2c7c-I: TCP; ports: s=46334 d=8080
<idle>-0 [001] ..s. 160412.041399: 0: cali2c7c-I: CT-6 lookup from a983526:46334
<idle>-0 [001] ..s. 160412.041400: 0: cali2c7c-I: CT-6 lookup to a9edf83:8080
<idle>-0 [001] ..s. 160412.041402: 0: cali2c7c-I: CT-6 tun_ip:a98351f
<idle>-0 [001] ..s. 160412.041403: 0: cali2c7c-I: CT-6 Hit! NAT REV entry but not connection opener: ESTABLISHED.
<idle>-0 [001] ..s. 160412.041405: 0: cali2c7c-I: CT-6 result: 1
<idle>-0 [001] ..s. 160412.041406: 0: cali2c7c-I: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041408: 0: cali2c7c-I: src=a983526 dst=a9edf83
<idle>-0 [001] ..s. 160412.041409: 0: cali2c7c-I: post_nat=0:0
<idle>-0 [001] ..s. 160412.041409: 0: cali2c7c-I: tun_ip=0
<idle>-0 [001] ..s. 160412.041410: 0: cali2c7c-I: pol_rc=0
<idle>-0 [001] ..s. 160412.041411: 0: cali2c7c-I: sport=46334
<idle>-0 [001] ..s. 160412.041411: 0: cali2c7c-I: flags=0
<idle>-0 [001] ..s. 160412.041412: 0: cali2c7c-I: ct_rc=1
<idle>-0 [001] ..s. 160412.041413: 0: cali2c7c-I: ct_related=0
<idle>-0 [001] ..s. 160412.041414: 0: cali2c7c-I: ip->ttl 62
<idle>-0 [001] ..s. 160412.041416: 0: cali2c7c-I: Final result=ALLOW (-1). Program execution time: 20816ns
<idle>-0 [001] ..s. 160412.041422: 0: ens192---I: IP id=4346 s=a98351f d=a9835c6
<idle>-0 [001] ..s. 160412.041423: 0: ens192---I: VXLAN tunnel packet to c635980a (host IP=c635980a)
<idle>-0 [001] ..s. 160412.041426: 0: ens192---I: ARP update for ifindex 2 ip a98351f
<idle>-0 [001] ..s. 160412.041427: 0: ens192---I: vxlan decap
<idle>-0 [001] ..s. 160412.041428: 0: ens192---I: IP id=4346 s=a983526 d=a9e04c1
<idle>-0 [001] ..s. 160412.041429: 0: ens192---I: vxlan decap origin a98351f
<idle>-0 [001] ..s. 160412.041431: 0: ens192---I: TCP; ports: s=46334 d=80
<idle>-0 [001] ..s. 160412.041432: 0: ens192---I: CT-6 lookup from a983526:46334
<idle>-0 [001] ..s. 160412.041433: 0: ens192---I: CT-6 lookup to a9e04c1:80
<idle>-0 [001] ..s. 160412.041436: 0: ens192---I: CT-6 Hit! NAT FWD entry, doing secondary lookup.
<idle>-0 [001] ..s. 160412.041437: 0: ens192---I: CT-6 fwd tun_ip:a98351f
<idle>-0 [001] ..s. 160412.041440: 0: ens192---I: CT-6 result: 4
<idle>-0 [001] ..s. 160412.041441: 0: ens192---I: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041442: 0: ens192---I: src=a983526 dst=a9e04c1
<idle>-0 [001] ..s. 160412.041443: 0: ens192---I: post_nat=0:0
<idle>-0 [001] ..s. 160412.041444: 0: ens192---I: tun_ip=1f35980a
<idle>-0 [001] ..s. 160412.041445: 0: ens192---I: pol_rc=0
<idle>-0 [001] ..s. 160412.041446: 0: ens192---I: sport=46334
<idle>-0 [001] ..s. 160412.041446: 0: ens192---I: flags=0
<idle>-0 [001] ..s. 160412.041447: 0: ens192---I: ct_rc=4
<idle>-0 [001] ..s. 160412.041448: 0: ens192---I: ct_related=0
<idle>-0 [001] ..s. 160412.041449: 0: ens192---I: ip->ttl 63
<idle>-0 [001] ..s. 160412.041451: 0: ens192---I: CT: DNAT to a9edf83:8080
<idle>-0 [001] ..s. 160412.041452: 0: ens192---I: L4 checksum update (csum is at 50) IP from a9e04c1 to a9edf83
<idle>-0 [001] ..s. 160412.041454: 0: ens192---I: bpf_l4_csum_replace(IP): 0
<idle>-0 [001] ..s. 160412.041455: 0: ens192---I: L4 checksum update (csum is at 50) port from 80 to 8080
<idle>-0 [001] ..s. 160412.041456: 0: ens192---I: bpf_l4_csum_replace(port): 0
<idle>-0 [001] ..s. 160412.041458: 0: ens192---I: IP id=4346 s=a983526 d=a9edf83
<idle>-0 [001] ..s. 160412.041460: 0: ens192---I: FIB family=2
<idle>-0 [001] ..s. 160412.041461: 0: ens192---I: FIB tot_len=158
<idle>-0 [001] ..s. 160412.041462: 0: ens192---I: FIB ifindex=2
<idle>-0 [001] ..s. 160412.041462: 0: ens192---I: FIB l4_protocol=6
<idle>-0 [001] ..s. 160412.041463: 0: ens192---I: FIB sport=46334
<idle>-0 [001] ..s. 160412.041464: 0: ens192---I: FIB dport=8080
<idle>-0 [001] ..s. 160412.041465: 0: ens192---I: FIB ipv4_src=a983526
<idle>-0 [001] ..s. 160412.041465: 0: ens192---I: FIB ipv4_dst=a9edf83
<idle>-0 [001] ..s. 160412.041466: 0: ens192---I: Traffic is towards the host namespace, doing Linux FIB lookup
<idle>-0 [001] ..s. 160412.041468: 0: ens192---I: FIB lookup succeeded
<idle>-0 [001] ..s. 160412.041469: 0: ens192---I: Got Linux FIB hit, redirecting to iface 9.
<idle>-0 [001] ..s. 160412.041471: 0: ens192---I: Traffic is towards host namespace, marking with c3400000.
<idle>-0 [001] ..s. 160412.041472: 0: ens192---I: Final result=ALLOW (7). Program execution time: 50331ns
<idle>-0 [001] ..s. 160412.041474: 0: cali2c7c-I: IP id=4346 s=a983526 d=a9edf83
<idle>-0 [001] ..s. 160412.041476: 0: cali2c7c-I: TCP; ports: s=46334 d=8080
<idle>-0 [001] ..s. 160412.041477: 0: cali2c7c-I: CT-6 lookup from a983526:46334
<idle>-0 [001] ..s. 160412.041478: 0: cali2c7c-I: CT-6 lookup to a9edf83:8080
<idle>-0 [001] ..s. 160412.041480: 0: cali2c7c-I: CT-6 tun_ip:a98351f
<idle>-0 [001] ..s. 160412.041481: 0: cali2c7c-I: CT-6 Hit! NAT REV entry but not connection opener: ESTABLISHED.
<idle>-0 [001] ..s. 160412.041483: 0: cali2c7c-I: CT-6 result: 1
<idle>-0 [001] ..s. 160412.041485: 0: cali2c7c-I: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041486: 0: cali2c7c-I: src=a983526 dst=a9edf83
<idle>-0 [001] ..s. 160412.041486: 0: cali2c7c-I: post_nat=0:0
<idle>-0 [001] ..s. 160412.041487: 0: cali2c7c-I: tun_ip=0
<idle>-0 [001] ..s. 160412.041488: 0: cali2c7c-I: pol_rc=0
<idle>-0 [001] ..s. 160412.041489: 0: cali2c7c-I: sport=46334
<idle>-0 [001] ..s. 160412.041489: 0: cali2c7c-I: flags=0
<idle>-0 [001] ..s. 160412.041490: 0: cali2c7c-I: ct_rc=1
<idle>-0 [001] ..s. 160412.041491: 0: cali2c7c-I: ct_related=0
<idle>-0 [001] ..s. 160412.041491: 0: cali2c7c-I: ip->ttl 62
<idle>-0 [001] ..s. 160412.041494: 0: cali2c7c-I: Final result=ALLOW (-1). Program execution time: 19472ns
<idle>-0 [001] ..s. 160412.041526: 0: cali2c7c-E: IP id=0 s=a9edf83 d=a983526
<idle>-0 [001] ..s. 160412.041528: 0: cali2c7c-E: TCP; ports: s=8080 d=46334
<idle>-0 [001] ..s. 160412.041530: 0: cali2c7c-E: CT-6 lookup from a9edf83:8080
<idle>-0 [001] ..s. 160412.041531: 0: cali2c7c-E: CT-6 lookup to a983526:46334
<idle>-0 [001] ..s. 160412.041533: 0: cali2c7c-E: CT-6 tun_ip:a98351f
<idle>-0 [001] ..s. 160412.041535: 0: cali2c7c-E: CT-6 Hit! NAT REV entry at ingress to connection opener: SNAT.
<idle>-0 [001] ..s. 160412.041536: 0: cali2c7c-E: CT-6 RST seen, marking CT entry.
<idle>-0 [001] ..s. 160412.041537: 0: cali2c7c-E: CT-6 result: 3
<idle>-0 [001] ..s. 160412.041539: 0: cali2c7c-E: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041540: 0: cali2c7c-E: src=a9edf83 dst=a983526
<idle>-0 [001] ..s. 160412.041540: 0: cali2c7c-E: post_nat=0:0
<idle>-0 [001] ..s. 160412.041541: 0: cali2c7c-E: tun_ip=0
<idle>-0 [001] ..s. 160412.041542: 0: cali2c7c-E: pol_rc=0
<idle>-0 [001] ..s. 160412.041543: 0: cali2c7c-E: sport=8080
<idle>-0 [001] ..s. 160412.041543: 0: cali2c7c-E: flags=0
<idle>-0 [001] ..s. 160412.041545: 0: cali2c7c-E: ct_rc=3
<idle>-0 [001] ..s. 160412.041546: 0: cali2c7c-E: ct_related=0
<idle>-0 [001] ..s. 160412.041547: 0: cali2c7c-E: ip->ttl 64
<idle>-0 [001] ..s. 160412.041549: 0: cali2c7c-E: CT: SNAT from a9e04c1:80
<idle>-0 [001] ..s. 160412.041551: 0: cali2c7c-E: L4 checksum update (csum is at 50) IP from a9edf83 to a9e04c1
<idle>-0 [001] ..s. 160412.041553: 0: cali2c7c-E: bpf_l4_csum_replace(IP): 0
<idle>-0 [001] ..s. 160412.041554: 0: cali2c7c-E: L4 checksum update (csum is at 50) port from 8080 to 80
<idle>-0 [001] ..s. 160412.041555: 0: cali2c7c-E: bpf_l4_csum_replace(port): 0
<idle>-0 [001] ..s. 160412.041559: 0: cali2c7c-E: vxlan encap a9edf83 : a98351f
<idle>-0 [001] ..s. 160412.041561: 0: cali2c7c-E: vxlan return 1 ifindex_fwd 2
<idle>-0 [001] ..s. 160412.041563: 0: cali2c7c-E: Redirect directly to interface (2) succeeded.
<idle>-0 [001] ..s. 160412.041565: 0: cali2c7c-E: Traffic is towards host namespace, marking with c3500000.
<idle>-0 [001] ..s. 160412.041566: 0: cali2c7c-E: Final result=ALLOW (7). Program execution time: 40334ns
<idle>-0 [001] ..s. 160412.041568: 0: ens192---E: Packet approved for forward - src ip fixup
<idle>-0 [001] ..s. 160412.041569: 0: ens192---E: IP id=0 s=a9edf83 d=a98351f
<idle>-0 [001] ..s. 160412.041570: 0: ens192---E: src ip fixup a9835c6
<idle>-0 [001] ..s. 160412.041572: 0: ens192---E: Final result=ALLOW (-1). Program execution time: 3612ns
<idle>-0 [001] ..s. 160412.041591: 0: cali2c7c-E: IP id=0 s=a9edf83 d=a983526
<idle>-0 [001] ..s. 160412.041592: 0: cali2c7c-E: TCP; ports: s=8080 d=46334
<idle>-0 [001] ..s. 160412.041594: 0: cali2c7c-E: CT-6 lookup from a9edf83:8080
<idle>-0 [001] ..s. 160412.041595: 0: cali2c7c-E: CT-6 lookup to a983526:46334
<idle>-0 [001] ..s. 160412.041598: 0: cali2c7c-E: CT-6 tun_ip:a98351f
<idle>-0 [001] ..s. 160412.041599: 0: cali2c7c-E: CT-6 Hit! NAT REV entry at ingress to connection opener: SNAT.
<idle>-0 [001] ..s. 160412.041601: 0: cali2c7c-E: CT-6 RST seen, marking CT entry.
<idle>-0 [001] ..s. 160412.041602: 0: cali2c7c-E: CT-6 result: 3
<idle>-0 [001] ..s. 160412.041603: 0: cali2c7c-E: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041604: 0: cali2c7c-E: src=a9edf83 dst=a983526
<idle>-0 [001] ..s. 160412.041605: 0: cali2c7c-E: post_nat=0:0
<idle>-0 [001] ..s. 160412.041606: 0: cali2c7c-E: tun_ip=0
<idle>-0 [001] ..s. 160412.041606: 0: cali2c7c-E: pol_rc=0
<idle>-0 [001] ..s. 160412.041607: 0: cali2c7c-E: sport=8080
<idle>-0 [001] ..s. 160412.041608: 0: cali2c7c-E: flags=0
<idle>-0 [001] ..s. 160412.041608: 0: cali2c7c-E: ct_rc=3
<idle>-0 [001] ..s. 160412.041609: 0: cali2c7c-E: ct_related=0
<idle>-0 [001] ..s. 160412.041610: 0: cali2c7c-E: ip->ttl 64
<idle>-0 [001] ..s. 160412.041612: 0: cali2c7c-E: CT: SNAT from a9e04c1:80
<idle>-0 [001] ..s. 160412.041614: 0: cali2c7c-E: L4 checksum update (csum is at 50) IP from a9edf83 to a9e04c1
<idle>-0 [001] ..s. 160412.041615: 0: cali2c7c-E: bpf_l4_csum_replace(IP): 0
<idle>-0 [001] ..s. 160412.041616: 0: cali2c7c-E: L4 checksum update (csum is at 50) port from 8080 to 80
<idle>-0 [001] ..s. 160412.041618: 0: cali2c7c-E: bpf_l4_csum_replace(port): 0
<idle>-0 [001] ..s. 160412.041621: 0: cali2c7c-E: vxlan encap a9edf83 : a98351f
<idle>-0 [001] ..s. 160412.041623: 0: cali2c7c-E: vxlan return 1 ifindex_fwd 2
<idle>-0 [001] ..s. 160412.041625: 0: cali2c7c-E: Redirect directly to interface (2) succeeded.
<idle>-0 [001] ..s. 160412.041626: 0: cali2c7c-E: Traffic is towards host namespace, marking with c3500000.
<idle>-0 [001] ..s. 160412.041627: 0: cali2c7c-E: Final result=ALLOW (7). Program execution time: 36814ns
<idle>-0 [001] ..s. 160412.041629: 0: ens192---E: Packet approved for forward - src ip fixup
<idle>-0 [001] ..s. 160412.041630: 0: ens192---E: IP id=0 s=a9edf83 d=a98351f
<idle>-0 [001] ..s. 160412.041631: 0: ens192---E: src ip fixup a9835c6
<idle>-0 [001] ..s. 160412.041632: 0: ens192---E: Final result=ALLOW (-1). Program execution time: 3350ns
<idle>-0 [001] ..s. 160412.041673: 0: ens192---I: IP id=4347 s=a98351f d=a9835c6
<idle>-0 [001] ..s. 160412.041674: 0: ens192---I: VXLAN tunnel packet to c635980a (host IP=c635980a)
<idle>-0 [001] ..s. 160412.041676: 0: ens192---I: ARP update for ifindex 2 ip a98351f
<idle>-0 [001] ..s. 160412.041677: 0: ens192---I: vxlan decap
<idle>-0 [001] ..s. 160412.041679: 0: ens192---I: IP id=4347 s=a983526 d=a9e04c1
<idle>-0 [001] ..s. 160412.041680: 0: ens192---I: vxlan decap origin a98351f
<idle>-0 [001] ..s. 160412.041682: 0: ens192---I: TCP; ports: s=46334 d=80
<idle>-0 [001] ..s. 160412.041684: 0: ens192---I: CT-6 lookup from a983526:46334
<idle>-0 [001] ..s. 160412.041685: 0: ens192---I: CT-6 lookup to a9e04c1:80
<idle>-0 [001] ..s. 160412.041687: 0: ens192---I: CT-6 Hit! NAT FWD entry, doing secondary lookup.
<idle>-0 [001] ..s. 160412.041689: 0: ens192---I: CT-6 fwd tun_ip:a98351f
<idle>-0 [001] ..s. 160412.041692: 0: ens192---I: CT-6 result: 4
<idle>-0 [001] ..s. 160412.041693: 0: ens192---I: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041694: 0: ens192---I: src=a983526 dst=a9e04c1
<idle>-0 [001] ..s. 160412.041695: 0: ens192---I: post_nat=0:0
<idle>-0 [001] ..s. 160412.041695: 0: ens192---I: tun_ip=1f35980a
<idle>-0 [001] ..s. 160412.041697: 0: ens192---I: pol_rc=0
<idle>-0 [001] ..s. 160412.041698: 0: ens192---I: sport=46334
<idle>-0 [001] ..s. 160412.041698: 0: ens192---I: flags=0
<idle>-0 [001] ..s. 160412.041699: 0: ens192---I: ct_rc=4
<idle>-0 [001] ..s. 160412.041700: 0: ens192---I: ct_related=0
<idle>-0 [001] ..s. 160412.041701: 0: ens192---I: ip->ttl 63
<idle>-0 [001] ..s. 160412.041703: 0: ens192---I: CT: DNAT to a9edf83:8080
<idle>-0 [001] ..s. 160412.041705: 0: ens192---I: L4 checksum update (csum is at 50) IP from a9e04c1 to a9edf83
<idle>-0 [001] ..s. 160412.041706: 0: ens192---I: bpf_l4_csum_replace(IP): 0
<idle>-0 [001] ..s. 160412.041707: 0: ens192---I: L4 checksum update (csum is at 50) port from 80 to 8080
<idle>-0 [001] ..s. 160412.041709: 0: ens192---I: bpf_l4_csum_replace(port): 0
<idle>-0 [001] ..s. 160412.041710: 0: ens192---I: IP id=4347 s=a983526 d=a9edf83
<idle>-0 [001] ..s. 160412.041712: 0: ens192---I: FIB family=2
<idle>-0 [001] ..s. 160412.041712: 0: ens192---I: FIB tot_len=52
<idle>-0 [001] ..s. 160412.041713: 0: ens192---I: FIB ifindex=2
<idle>-0 [001] ..s. 160412.041714: 0: ens192---I: FIB l4_protocol=6
<idle>-0 [001] ..s. 160412.041715: 0: ens192---I: FIB sport=46334
<idle>-0 [001] ..s. 160412.041715: 0: ens192---I: FIB dport=8080
<idle>-0 [001] ..s. 160412.041716: 0: ens192---I: FIB ipv4_src=a983526
<idle>-0 [001] ..s. 160412.041717: 0: ens192---I: FIB ipv4_dst=a9edf83
<idle>-0 [001] ..s. 160412.041718: 0: ens192---I: Traffic is towards the host namespace, doing Linux FIB lookup
<idle>-0 [001] ..s. 160412.041720: 0: ens192---I: FIB lookup succeeded
<idle>-0 [001] ..s. 160412.041722: 0: ens192---I: Got Linux FIB hit, redirecting to iface 9.
<idle>-0 [001] ..s. 160412.041723: 0: ens192---I: Traffic is towards host namespace, marking with c3400000.
<idle>-0 [001] ..s. 160412.041725: 0: ens192---I: Final result=ALLOW (7). Program execution time: 52754ns
<idle>-0 [001] ..s. 160412.041727: 0: cali2c7c-I: IP id=4347 s=a983526 d=a9edf83
<idle>-0 [001] ..s. 160412.041729: 0: cali2c7c-I: TCP; ports: s=46334 d=8080
<idle>-0 [001] ..s. 160412.041730: 0: cali2c7c-I: CT-6 lookup from a983526:46334
<idle>-0 [001] ..s. 160412.041731: 0: cali2c7c-I: CT-6 lookup to a9edf83:8080
<idle>-0 [001] ..s. 160412.041733: 0: cali2c7c-I: CT-6 tun_ip:a98351f
<idle>-0 [001] ..s. 160412.041735: 0: cali2c7c-I: CT-6 Hit! NAT REV entry but not connection opener: ESTABLISHED.
<idle>-0 [001] ..s. 160412.041737: 0: cali2c7c-I: CT-6 result: 1
<idle>-0 [001] ..s. 160412.041738: 0: cali2c7c-I: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041739: 0: cali2c7c-I: src=a983526 dst=a9edf83
<idle>-0 [001] ..s. 160412.041739: 0: cali2c7c-I: post_nat=0:0
<idle>-0 [001] ..s. 160412.041740: 0: cali2c7c-I: tun_ip=0
<idle>-0 [001] ..s. 160412.041741: 0: cali2c7c-I: pol_rc=0
<idle>-0 [001] ..s. 160412.041742: 0: cali2c7c-I: sport=46334
<idle>-0 [001] ..s. 160412.041742: 0: cali2c7c-I: flags=0
<idle>-0 [001] ..s. 160412.041743: 0: cali2c7c-I: ct_rc=1
<idle>-0 [001] ..s. 160412.041744: 0: cali2c7c-I: ct_related=0
<idle>-0 [001] ..s. 160412.041745: 0: cali2c7c-I: ip->ttl 62
<idle>-0 [001] ..s. 160412.041747: 0: cali2c7c-I: Final result=ALLOW (-1). Program execution time: 19492ns
<idle>-0 [001] ..s. 160412.041766: 0: cali2c7c-E: IP id=0 s=a9edf83 d=a983526
<idle>-0 [001] ..s. 160412.041767: 0: cali2c7c-E: TCP; ports: s=8080 d=46334
<idle>-0 [001] ..s. 160412.041769: 0: cali2c7c-E: CT-6 lookup from a9edf83:8080
<idle>-0 [001] ..s. 160412.041770: 0: cali2c7c-E: CT-6 lookup to a983526:46334
<idle>-0 [001] ..s. 160412.041772: 0: cali2c7c-E: CT-6 tun_ip:a98351f
<idle>-0 [001] ..s. 160412.041774: 0: cali2c7c-E: CT-6 Hit! NAT REV entry at ingress to connection opener: SNAT.
<idle>-0 [001] ..s. 160412.041775: 0: cali2c7c-E: CT-6 RST seen, marking CT entry.
<idle>-0 [001] ..s. 160412.041777: 0: cali2c7c-E: CT-6 result: 3
<idle>-0 [001] ..s. 160412.041778: 0: cali2c7c-E: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160412.041779: 0: cali2c7c-E: src=a9edf83 dst=a983526
<idle>-0 [001] ..s. 160412.041780: 0: cali2c7c-E: post_nat=0:0
<idle>-0 [001] ..s. 160412.041780: 0: cali2c7c-E: tun_ip=0
<idle>-0 [001] ..s. 160412.041781: 0: cali2c7c-E: pol_rc=0
<idle>-0 [001] ..s. 160412.041782: 0: cali2c7c-E: sport=8080
<idle>-0 [001] ..s. 160412.041782: 0: cali2c7c-E: flags=0
<idle>-0 [001] ..s. 160412.041783: 0: cali2c7c-E: ct_rc=3
<idle>-0 [001] ..s. 160412.041784: 0: cali2c7c-E: ct_related=0
<idle>-0 [001] ..s. 160412.041785: 0: cali2c7c-E: ip->ttl 64
<idle>-0 [001] ..s. 160412.041787: 0: cali2c7c-E: CT: SNAT from a9e04c1:80
<idle>-0 [001] ..s. 160412.041789: 0: cali2c7c-E: L4 checksum update (csum is at 50) IP from a9edf83 to a9e04c1
<idle>-0 [001] ..s. 160412.041790: 0: cali2c7c-E: bpf_l4_csum_replace(IP): 0
<idle>-0 [001] ..s. 160412.041791: 0: cali2c7c-E: L4 checksum update (csum is at 50) port from 8080 to 80
<idle>-0 [001] ..s. 160412.041793: 0: cali2c7c-E: bpf_l4_csum_replace(port): 0
<idle>-0 [001] ..s. 160412.041796: 0: cali2c7c-E: vxlan encap a9edf83 : a98351f
<idle>-0 [001] ..s. 160412.041798: 0: cali2c7c-E: vxlan return 1 ifindex_fwd 2
<idle>-0 [001] ..s. 160412.041800: 0: cali2c7c-E: Redirect directly to interface (2) succeeded.
<idle>-0 [001] ..s. 160412.041802: 0: cali2c7c-E: Traffic is towards host namespace, marking with c3500000.
<idle>-0 [001] ..s. 160412.041803: 0: cali2c7c-E: Final result=ALLOW (7). Program execution time: 37848ns
<idle>-0 [001] ..s. 160412.041805: 0: ens192---E: Packet approved for forward - src ip fixup
<idle>-0 [001] ..s. 160412.041806: 0: ens192---E: IP id=0 s=a9edf83 d=a98351f
<idle>-0 [001] ..s. 160412.041807: 0: ens192---E: src ip fixup a9835c6
<idle>-0 [001] ..s. 160412.041808: 0: ens192---E: Final result=ALLOW (-1). Program execution time: 3360ns
<idle>-0 [001] ..s. 160413.072450: 0: cali2c7c-E: IP id=0 s=a9edf83 d=a983526
<idle>-0 [001] ..s. 160413.072483: 0: cali2c7c-E: TCP; ports: s=8080 d=46334
<idle>-0 [001] ..s. 160413.072486: 0: cali2c7c-E: CT-6 lookup from a9edf83:8080
<idle>-0 [001] ..s. 160413.072487: 0: cali2c7c-E: CT-6 lookup to a983526:46334
<idle>-0 [001] ..s. 160413.072492: 0: cali2c7c-E: CT-6 tun_ip:a98351f
<idle>-0 [001] ..s. 160413.072494: 0: cali2c7c-E: CT-6 Hit! NAT REV entry at ingress to connection opener: SNAT.
<idle>-0 [001] ..s. 160413.072497: 0: cali2c7c-E: CT-6 result: 3
<idle>-0 [001] ..s. 160413.072499: 0: cali2c7c-E: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160413.072500: 0: cali2c7c-E: src=a9edf83 dst=a983526
<idle>-0 [001] ..s. 160413.072501: 0: cali2c7c-E: post_nat=0:0
<idle>-0 [001] ..s. 160413.072501: 0: cali2c7c-E: tun_ip=0
<idle>-0 [001] ..s. 160413.072504: 0: cali2c7c-E: pol_rc=0
<idle>-0 [001] ..s. 160413.072505: 0: cali2c7c-E: sport=8080
<idle>-0 [001] ..s. 160413.072506: 0: cali2c7c-E: flags=0
<idle>-0 [001] ..s. 160413.072507: 0: cali2c7c-E: ct_rc=3
<idle>-0 [001] ..s. 160413.072508: 0: cali2c7c-E: ct_related=0
<idle>-0 [001] ..s. 160413.072509: 0: cali2c7c-E: ip->ttl 64
<idle>-0 [001] ..s. 160413.072511: 0: cali2c7c-E: CT: SNAT from a9e04c1:80
<idle>-0 [001] ..s. 160413.072514: 0: cali2c7c-E: L4 checksum update (csum is at 50) IP from a9edf83 to a9e04c1
<idle>-0 [001] ..s. 160413.072516: 0: cali2c7c-E: bpf_l4_csum_replace(IP): 0
<idle>-0 [001] ..s. 160413.072518: 0: cali2c7c-E: L4 checksum update (csum is at 50) port from 8080 to 80
<idle>-0 [001] ..s. 160413.072519: 0: cali2c7c-E: bpf_l4_csum_replace(port): 0
<idle>-0 [001] ..s. 160413.072524: 0: cali2c7c-E: vxlan encap a9edf83 : a98351f
<idle>-0 [001] ..s. 160413.072527: 0: cali2c7c-E: vxlan return 1 ifindex_fwd 2
<idle>-0 [001] ..s. 160413.072531: 0: cali2c7c-E: Redirect directly to interface (2) succeeded.
<idle>-0 [001] ..s. 160413.072532: 0: cali2c7c-E: Traffic is towards host namespace, marking with c3500000.
<idle>-0 [001] ..s. 160413.072534: 0: cali2c7c-E: Final result=ALLOW (7). Program execution time: 88851ns
<idle>-0 [001] ..s. 160413.072540: 0: ens192---E: Packet approved for forward - src ip fixup
<idle>-0 [001] ..s. 160413.072542: 0: ens192---E: IP id=0 s=a9edf83 d=a98351f
<idle>-0 [001] ..s. 160413.072543: 0: ens192---E: src ip fixup a9835c6
<idle>-0 [001] ..s. 160413.072545: 0: ens192---E: Final result=ALLOW (-1). Program execution time: 5312ns
<idle>-0 [001] ..s. 160413.073124: 0: ens192---I: IP id=22866 s=a98351f d=a9835c6
<idle>-0 [001] ..s. 160413.073142: 0: ens192---I: VXLAN tunnel packet to c635980a (host IP=c635980a)
<idle>-0 [001] ..s. 160413.073148: 0: ens192---I: ARP update for ifindex 2 ip a98351f
<idle>-0 [001] ..s. 160413.073150: 0: ens192---I: vxlan decap
<idle>-0 [001] ..s. 160413.073153: 0: ens192---I: IP id=22866 s=a983526 d=a9e04c1
<idle>-0 [001] ..s. 160413.073154: 0: ens192---I: vxlan decap origin a98351f
<idle>-0 [001] ..s. 160413.073156: 0: ens192---I: TCP; ports: s=46334 d=80
<idle>-0 [001] ..s. 160413.073158: 0: ens192---I: CT-6 lookup from a983526:46334
<idle>-0 [001] ..s. 160413.073160: 0: ens192---I: CT-6 lookup to a9e04c1:80
<idle>-0 [001] ..s. 160413.073163: 0: ens192---I: CT-6 Hit! NAT FWD entry, doing secondary lookup.
<idle>-0 [001] ..s. 160413.073166: 0: ens192---I: CT-6 fwd tun_ip:a98351f
<idle>-0 [001] ..s. 160413.073169: 0: ens192---I: CT-6 RST seen, marking CT entry.
<idle>-0 [001] ..s. 160413.073171: 0: ens192---I: CT-6 result: 4
<idle>-0 [001] ..s. 160413.073172: 0: ens192---I: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160413.073173: 0: ens192---I: src=a983526 dst=a9e04c1
<idle>-0 [001] ..s. 160413.073174: 0: ens192---I: post_nat=0:0
<idle>-0 [001] ..s. 160413.073175: 0: ens192---I: tun_ip=1f35980a
<idle>-0 [001] ..s. 160413.073176: 0: ens192---I: pol_rc=0
<idle>-0 [001] ..s. 160413.073177: 0: ens192---I: sport=46334
<idle>-0 [001] ..s. 160413.073178: 0: ens192---I: flags=0
<idle>-0 [001] ..s. 160413.073178: 0: ens192---I: ct_rc=4
<idle>-0 [001] ..s. 160413.073179: 0: ens192---I: ct_related=0
<idle>-0 [001] ..s. 160413.073180: 0: ens192---I: ip->ttl 63
<idle>-0 [001] ..s. 160413.073183: 0: ens192---I: CT: DNAT to a9edf83:8080
<idle>-0 [001] ..s. 160413.073186: 0: ens192---I: L4 checksum update (csum is at 50) IP from a9e04c1 to a9edf83
<idle>-0 [001] ..s. 160413.073187: 0: ens192---I: bpf_l4_csum_replace(IP): 0
<idle>-0 [001] ..s. 160413.073189: 0: ens192---I: L4 checksum update (csum is at 50) port from 80 to 8080
<idle>-0 [001] ..s. 160413.073190: 0: ens192---I: bpf_l4_csum_replace(port): 0
<idle>-0 [001] ..s. 160413.073193: 0: ens192---I: IP id=22866 s=a983526 d=a9edf83
<idle>-0 [001] ..s. 160413.073195: 0: ens192---I: FIB family=2
<idle>-0 [001] ..s. 160413.073196: 0: ens192---I: FIB tot_len=40
<idle>-0 [001] ..s. 160413.073196: 0: ens192---I: FIB ifindex=2
<idle>-0 [001] ..s. 160413.073197: 0: ens192---I: FIB l4_protocol=6
<idle>-0 [001] ..s. 160413.073198: 0: ens192---I: FIB sport=46334
<idle>-0 [001] ..s. 160413.073199: 0: ens192---I: FIB dport=8080
<idle>-0 [001] ..s. 160413.073199: 0: ens192---I: FIB ipv4_src=a983526
<idle>-0 [001] ..s. 160413.073200: 0: ens192---I: FIB ipv4_dst=a9edf83
<idle>-0 [001] ..s. 160413.073201: 0: ens192---I: Traffic is towards the host namespace, doing Linux FIB lookup
<idle>-0 [001] ..s. 160413.073209: 0: ens192---I: FIB lookup succeeded
<idle>-0 [001] ..s. 160413.073211: 0: ens192---I: Got Linux FIB hit, redirecting to iface 9.
<idle>-0 [001] ..s. 160413.073213: 0: ens192---I: Traffic is towards host namespace, marking with c3400000.
<idle>-0 [001] ..s. 160413.073215: 0: ens192---I: Final result=ALLOW (7). Program execution time: 93250ns
<idle>-0 [001] ..s. 160413.073221: 0: cali2c7c-I: IP id=22866 s=a983526 d=a9edf83
<idle>-0 [001] ..s. 160413.073223: 0: cali2c7c-I: TCP; ports: s=46334 d=8080
<idle>-0 [001] ..s. 160413.073225: 0: cali2c7c-I: CT-6 lookup from a983526:46334
<idle>-0 [001] ..s. 160413.073227: 0: cali2c7c-I: CT-6 lookup to a9edf83:8080
<idle>-0 [001] ..s. 160413.073230: 0: cali2c7c-I: CT-6 tun_ip:a98351f
<idle>-0 [001] ..s. 160413.073232: 0: cali2c7c-I: CT-6 Hit! NAT REV entry but not connection opener: ESTABLISHED.
<idle>-0 [001] ..s. 160413.073234: 0: cali2c7c-I: CT-6 RST seen, marking CT entry.
<idle>-0 [001] ..s. 160413.073236: 0: cali2c7c-I: CT-6 result: 1
<idle>-0 [001] ..s. 160413.073237: 0: cali2c7c-I: Entering calico_tc_skb_accepted
<idle>-0 [001] ..s. 160413.073239: 0: cali2c7c-I: src=a983526 dst=a9edf83
<idle>-0 [001] ..s. 160413.073240: 0: cali2c7c-I: post_nat=0:0
<idle>-0 [001] ..s. 160413.073241: 0: cali2c7c-I: tun_ip=0
<idle>-0 [001] ..s. 160413.073242: 0: cali2c7c-I: pol_rc=0
<idle>-0 [001] ..s. 160413.073243: 0: cali2c7c-I: sport=46334
<idle>-0 [001] ..s. 160413.073244: 0: cali2c7c-I: flags=0
<idle>-0 [001] ..s. 160413.073245: 0: cali2c7c-I: ct_rc=1
<idle>-0 [001] ..s. 160413.073245: 0: cali2c7c-I: ct_related=0
<idle>-0 [001] ..s. 160413.073246: 0: cali2c7c-I: ip->ttl 62
<idle>-0 [001] ..s. 160413.073249: 0: cali2c7c-I: Final result=ALLOW (-1). Program execution time: 29533ns
Currently, each new incoming SYN gets load balanced independently; wonder if we should re-use the same backend to avoid any cross-talk.
After reviewing some of the pcaps I think @fasaxc hit the nail right on. I think we considered resolving this issue before, we were worried that reusing the faulty backend may result in the connection being stuck for a very long time. If the backend does not exist anymore, we will eventually (within 10s) clean up the conntrack, if the backend is just faulty, but still exists from k8s point of view, we would solely rely on the connection timing out.
In this specific case it seems like an issue external to the bpf dataplane triggers an unfortunate behaviour that starts a handshake with one backed, but thanks to a SYN retransmit (the SYN+ACK) likely lost on the way back to the LB starts a handshake with another pod. The following pic captures what LB sees:
1s gap SYN+ACK lost, LB decides to send another SYN. If the SYN+ACK wasn't lost, it would keep resending ACKs
another 1s gap due to a tcp timer
Everything so messed up that I both sides gave up or the dump ends :shrug:
@fasaxc In my case, I have not enabled DSR mode. Using the default configuration for that.
@renan @emptywee I made a test build of calico/node
with a potential fix if you want to try it out; the image name is fasaxc/calico-node:ct-syns-2
. Obviously, it's a test build so you probably don't want to drop it onto your production cluster!
@fasaxc This is looking good!
I've tried your build on the two initial scenarios of this issue and it had no failures!
I am going to test a bit more but this might be it.
11:40:47 INFO: Test duration: 0:10:18
11:40:47 INFO: Samples count: 257361, 0.00% failures
11:40:47 INFO: Average times: total 0.046, latency 0.046, connect 0.002
11:40:47 INFO: Percentiles:
┌───────────────┬───────────────┐
│ Percentile, % │ Resp. Time, s │
├───────────────┼───────────────┤
│ 0.0 │ 0.024 │
│ 50.0 │ 0.032 │
│ 90.0 │ 0.067 │
│ 95.0 │ 0.149 │
│ 99.0 │ 0.193 │
│ 99.9 │ 0.456 │
│ 100.0 │ 1.206 │
└───────────────┴───────────────┘
11:40:47 INFO: Request label stats:
┌──────────────────────────┬────────┬─────────┬────────┬───────┐
│ label │ status │ succ │ avg_rt │ error │
├──────────────────────────┼────────┼─────────┼────────┼───────┤
│ http://10.209.2.32:30080 │ OK │ 100.00% │ 0.046 │ │
│ http://10.209.3.68:30080 │ OK │ 100.00% │ 0.046 │ │
└──────────────────────────┴────────┴─────────┴────────┴───────┘
11:59:49 INFO: Test duration: 0:10:18
11:59:49 INFO: Samples count: 256162, 0.00% failures
11:59:49 INFO: Average times: total 0.046, latency 0.046, connect 0.003
11:59:49 INFO: Percentiles:
┌───────────────┬───────────────┐
│ Percentile, % │ Resp. Time, s │
├───────────────┼───────────────┤
│ 0.0 │ 0.024 │
│ 50.0 │ 0.032 │
│ 90.0 │ 0.068 │
│ 95.0 │ 0.149 │
│ 99.0 │ 0.197 │
│ 99.9 │ 0.466 │
│ 100.0 │ 1.582 │
└───────────────┴───────────────┘
11:59:49 INFO: Request label stats:
┌──────────────────────────┬────────┬─────────┬────────┬───────┐
│ label │ status │ succ │ avg_rt │ error │
├──────────────────────────┼────────┼─────────┼────────┼───────┤
│ http://10.209.2.32:30080 │ OK │ 100.00% │ 0.046 │ │
│ http://10.209.3.68:30080 │ OK │ 100.00% │ 0.046 │ │
└──────────────────────────┴────────┴─────────┴────────┴───────┘
That's great; I think I've missed the boat for v3.18.0 (we're already in code freeze for the release) at this point but I'll try to get this into v3.19 and v3.18.1
@fasaxc I am installing your test image in our lab and dev/qa clusters. I am confident the issue is fixed since it has been fixed for @renan and we probably had same problem by nature.
Sounds like this can be closed?
@caseydavenport yes, but would be nice to hear when it will be merged and released, at least approximately. We've been waiting for the fix for quite a while now :)
And thanks @fasaxc and @tomastigera very much for looking into this and fixing quickly.
Yeah, would be nice to keep it open until a fix is merged.
I believe projectcalico/felix#2675 is currently being worked on as a fix?
Yes, I just got back to that and added a test so I'm just waiting on review now.
Following the discussion on Slack about running Calico on eBPF: https://calicousers.slack.com/archives/CUKP5S64R/p1601550116016000
When running Calico on eBPF mode with a Kubernetes Service listening on a NodePort and running two backend pods, I have experienced 43 connection failures, out of around 250 thousand requests.
Running the same scenario, but with one backend pod, resulted in absolutely no connection errors.
The same setup was used with Calico on iptables mode and there were also no connection errors, regardless of the number of backend pods.
Setup
Kubernetes
v1.19.2
(self-managed, the hard way) running on AWS with one master node and two worker nodes. Nodes run on Ubuntu20.04.1 LTS
with Kernel5.4.0-1029-aws
and Docker-runtime19.3.8
.Calico
v3.17.1
on eBPF mode and flagsFELIX_BPFENABLED=true, CALICO_IPV4POOL_IPIP=Never, CALICO_IPV4POOL_VXLAN=Never, CALICO_IPV4POOL_NAT_OUTGOING=true
.Taurus was configured to do 20 concurrent requests for 10 minutes and making requests to both nodes on the exposed node port. Therefore, each node received around 125 thousand requests each.
There was no conversion from eBPF to iptables or vice-versa, the cluster was created from scratch with the specific setup.
10.199.1.4
10.209.0.203:30080
10.210.76.132:8080
10.209.2.205:30080
10.210.79.5:8080
Running only the bare minimum pods.
Service with NodePort
Deployment
One pod, no failures
I have run it multiple times. It hasn't failed so far.
One pod on each node, 43 failures
tcpdump
's are available here: https://www.dropbox.com/sh/hydbxyrlo9qdvwa/AACMxFb9YlbwSo1NzZLNGdyHaI have run it many times. The number of failures is not static. It always fails.
tcpdump's commands
Node one
Node two
Other runs
2 pods, all on node one = 19 failures 2 pods, all on node two = 22 failures
10 pods, all on node one = 31 failures 10 pods, all on node two = 26 failures
10 pods, 5 on each node = 24 failures 50 pods, 25 on each node = 38 failures