Closed iksoon-park closed 1 month ago
In your route -n
output I would expect to see 10.254.0.10 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
Without that the rest cannot quite work and the packets are bounced around. Could you share calico logs on that node? I should show that it is trying to program this route for the service and that it fails somehow. I think it might be related to this fix https://github.com/projectcalico/calico/pull/8983 You are likely to see the issue with any UDP service from a host networked pod.
Without that route, the packet is routed to the default route (eth0) where it happens to be NATed and turned around and thus reaches the local service pod, but on the way back it does not get NATed back (because it does not follow an expected path) and then lands on the host, but there is no socket that would take it so it generates the ICMP type 3 code 3 response (dest unreachable, no such port)
@tomastigera Thanks for the reply. But I think it's different from issue https://github.com/projectcalico/calico/pull/8983.
The meaning of each IP in the test in the main text is as follows:
As written in the main text, the routing table of the worker node where the problem occurred is as follows.
I checked the routing table, but the settings below that you mentioned are not confirmed.
10.254.0.10 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
As written in the BPF log in the main text, when the initial request was started, the "10.254.0.10" service ClusterIP was normally changed to "10.100.5.127" due to BPF's NAT.
1. Sending packets from host to coreDNS
Final result=ALLOW
Changed by NAT as follows:
Afterwards, the response is handled by the CoreDNS pod IP "10.100.5.127" instead of the IP "10.254.0.10".
Please check the status of the BPF NAT table and the Routing table.
I checked the tcpdump on the bpfin.cali and bpfout.cali NICs, and as expected, no packets were observed since there are no rules in the routing table. The following figure shows this.
I am also providing the calico-node pod logs from the cluster where the issue occurred. calico-node.log
This issue is resolved and works correctly when switching to Calico v3.24.1 in the same cluster. The issue has been identified in versions 3.27 to 3.28. Could you please check again?
@iksoon-park The route not present is the issue. As mentioned above, though the packet gets NATed to the correct pod IP, the return path is unexpected and results in failure. Need to find out why the routes are missing.
Do you see the below logs in calico-node?
Remove old route dest=10.254.0.10 ifaceName="bpfin.cali" ifaceRegex="bpfin.cali"
. These logs might appear after some time.
Did you try with 3.28.1?
@sridhartigera , @tomastigera
I understand. But..
After some time, the following log is not observed in calico-node:
Remove old route dest=10.254.0.10 ifaceName="bpfin.cali" ifaceRegex="bpfin.cali"
The same issue occurs in Calico v3.28.1. I have also checked all versions of 3.27, and the same problem occurs.
"10.254.0.10" corresponds to the ClusterIP of Kubernetes. As far as I know, the rule for this IP is not created in the routing table but is translated into the Pod IP by the NAT in the BPF Map.
Thus, I believe this issue is caused by Calico eBPF. Is there anything I might be misunderstanding?
@iksoon-park Please provide your felixconfiguration and calico-node debug logs. set logSeverityScreen
to Debug
in the felix configuration.
@sridhartigera
As mentioned in the text, the Calico in the test environment is as follows. Calico Setting
My FelixConfiguration settings are as follows. The environment variables are set as below:
NODENAME=${node_name}
CALICO_NETWORKING_BACKEND=bird
IP_AUTODETECTION_METHOD=interface=eth0
IP=autodetect
ETCD_ENDPOINTS=${etcd_endpoint}
ETCD_CERT_FILE=${file_path_cert.crt}
ETCD_CA_CERT_FILE=${file_path_ca_cert.crt}
ETCD_KEY_FILE=${file_path_key.pem}
ETCD_DISCOVERY_SRV=
CALICO_MANAGE_CNI=false
DATASTORE_TYPE=etcdv3
CALICO_IPV4POOL_IPIP=Never
CALICO_IPV4POOL_VXLAN=Never
CALICO_IPV6POOL_VXLAN=Never
FELIX_IPINIPMTU=0
FELIX_VXLANMTU=0
FELIX_WIREGUARDMTU=0
CALICO_IPV4POOL_CIDR=10.100.0.0/16
CALICO_IPV4POOL_BLOCK_SIZE=24
CALICO_DISABLE_FILE_LOGGING=true
FELIX_DEFAULTENDPOINTTOHOSTACTION=ACCEPT
FELIX_IPV6SUPPORT=false
FELIX_HEALTHENABLED=true
FELIX_BPFENABLED=true
FELIX_BPFDISABLEUNPRIVILEGED=true
FELIX_BPFKUBEPROXYIPTABLESCLEANUPENABLED=true
FELIX_BPFKUBEPROXYENDPOINTSLICESENABLED=true
FELIX_BPFDATAIFACEPATTERN=eth0
FELIX_XDPENABLED=true
FELIX_BPFLOGLEVEL=Debug
The configuration confirmed with calicoctl is as follows.
[calicoctl get felixconfiguration]
NAME
default
node.iksoon-27-default-worker-node-0
node.iksoon-27-master-0
node.iksoon-27-master-1
node.iksoon-27-master-2
[calicoctl get felixconfiguration default -o yaml]
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
creationTimestamp: "2024-08-27T04:30:42Z"
name: default
resourceVersion: "240"
uid: 329eeeb3-5703-4e1e-a14e-937b8f6da84a
spec:
bpfConnectTimeLoadBalancing: TCP
bpfHostNetworkedNATWithoutCTLB: Enabled
bpfLogLevel: ""
floatingIPs: Disabled
logSeverityScreen: Info
reportingInterval: 0s
[calicoctl get felixconfiguration node.iksoon-27-master-0 -o yaml]
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
creationTimestamp: "2024-08-27T04:30:42Z"
name: node.iksoon-27-master-0
resourceVersion: "241"
uid: 01a765c1-6237-44dd-a7c1-eab0b28da8dd
spec:
bpfConnectTimeLoadBalancing: TCP
bpfHostNetworkedNATWithoutCTLB: Enabled
bpfLogLevel: ""
defaultEndpointToHostAction: Return
floatingIPs: Disabled
[calicoctl get felixconfiguration node.iksoon-27-default-worker-node-0 -oyaml]
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
creationTimestamp: "2024-08-27T04:36:38Z"
name: node.iksoon-27-default-worker-node-0
resourceVersion: "1367"
uid: d208f176-cf76-4ad5-a1c5-3792d44c28cd
spec:
bpfConnectTimeLoadBalancing: TCP
bpfHostNetworkedNATWithoutCTLB: Enabled
bpfLogLevel: ""
defaultEndpointToHostAction: Return
floatingIPs: Disabled
The log content related to 10.254.0.10 is as follows. As I expected, it is handled by BPF NAT and the routing table is not configured.
2024-09-17 14:48:42.993 [DEBUG][67] felix/syncer.go 1046: resolved NATKey{Proto:6 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0} as kube-system/kube-dns:dns-tcp
2024-09-17 14:48:42.993 [DEBUG][67] felix/syncer.go 1046: resolved NATKey{Proto:17 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0} as kube-system/kube-dns:dns
2024-09-17 14:48:42.993 [DEBUG][67] felix/syncer.go 1046: resolved NATKey{Proto:6 Addr:10.254.0.10 Port:9153 SrcAddr:0.0.0.0/0} as kube-system/kube-dns:metrics
2024-09-17 14:48:42.994 [DEBUG][67] felix/syncer.go 585: Applying new state, {map[default/iksoon-nginx-service:10.254.152.135:8081/TCP default/kubernetes:https:10.254.0.1:443/TCP kube-system/calico-typha:calico-typha:10.254.127.49:5473/TCP kube-system/csi-cinder-controller-service:dummy:10.254.200.217:12345/TCP kube-system/kube-dns:dns:10.254.0.10:53/UDP kube-system/kube-dns:dns-tcp:10.254.0.10:53/TCP kube-system/kube-dns:metrics:10.254.0.10:9153/TCP kube-system/metrics-server:10.254.52.186:443/TCP] map[default/iksoon-nginx-service:[10.100.51.8:80 10.100.51.9:80] default/kubernetes:https:[192.168.0.106:6443 192.168.0.130:6443 192.168.0.28:6443] kube-system/calico-typha:calico-typha:[192.168.0.127:5473] kube-system/csi-cinder-controller-service:dummy:[10.100.51.2:12345] kube-system/kube-dns:dns:[10.100.51.0:53 10.100.51.12:53] kube-system/kube-dns:dns-tcp:[10.100.51.0:53 10.100.51.12:53] kube-system/kube-dns:metrics:[10.100.51.0:9153 10.100.51.12:9153] kube-system/metrics-server:[10.100.51.3:443]] kr-pub-a}
2024-09-17 14:48:42.994 [DEBUG][67] felix/syncer.go 942: bpf map writing NATKey{Proto:17 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0}:NATValue{ID:2,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}
2024-09-17 14:48:42.994 [DEBUG][67] felix/delta_tracker.go 125: Set bpfMap="cali_v4_nat_fe3" k=NATKey{Proto:17 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0} v=NATValue{ID:2,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}
2024-09-17 14:48:42.994 [DEBUG][67] felix/syncer.go 942: bpf map writing NATKey{Proto:6 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0}:NATValue{ID:3,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}
2024-09-17 14:48:42.994 [DEBUG][67] felix/delta_tracker.go 125: Set bpfMap="cali_v4_nat_fe3" k=NATKey{Proto:6 Addr:10.254.0.10 Port:53 SrcAddr:0.0.0.0/0} v=NATValue{ID:3,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}
2024-09-17 14:48:42.995 [DEBUG][67] felix/syncer.go 942: bpf map writing NATKey{Proto:6 Addr:10.254.0.10 Port:9153 SrcAddr:0.0.0.0/0}:NATValue{ID:4,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}
2024-09-17 14:48:42.995 [DEBUG][67] felix/delta_tracker.go 125: Set bpfMap="cali_v4_nat_fe3" k=NATKey{Proto:6 Addr:10.254.0.10 Port:9153 SrcAddr:0.0.0.0/0} v=NATValue{ID:4,Count:2,LocalCount:2,AffinityTimeout:0,Flags:{}}
I am also providing the calico-node log file with the logSeverityScreen setting set to Debug. calico-node-log.tar.gz
Please check and confirm.
The logs show that calico-node does not get any update about the service so we never program the route. kube-proxy does, but that gets it directly from kubernetes apiserver. But you are using typha and etcd. Seems like typha only sends updates about endpoints and not about services. Yo would have to figure out if etcd has the services.
Could you describe your exact setup? What k8s platform do you use (version) - you mentioned a few.
Note that not having service information propagated to calico-node may have effect not just on networking, but on policy as well.
I recommend using KDD mode: it's the mainline path and receives far more testing attention. etcd mode has been generally unnecessary for many years: Typha exists to mitigate the k8s API Server bottlenecks.
The install docs also advise against it - see https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises for example. I'm not sure if Typha+etcd is even documented anywhere at all.
If you're able to switch to KDD that would be my suggestion - if not, could you please describe why you need to use etcd mode? Thanks
Turns out that this specific feature does not work with etcd. If you need to use etcd, set bpfConnectTimeLoadBalancing=Enabled
and bpfHostNetworkedNATWithoutCTLB=Disabled
. However, you may experience some connectivity issues with DNS if a DNS backed that is actively used dies/migrates to a different node, see https://github.com/projectcalico/calico/issues/4509
@tomastigera Thank you for your response. I have confirmed that everything is working properly with the settings you suggested.
I am currently testing one potential issue, but even when I redeploy or move CoreDNS to a different node, no problems occur.
Here is my test case: deploy test server pod
apiVersion: apps/v1
kind: Deployment
metadata:
name: iksoon-deployment-nginx
labels:
app: iksoon-nginx-test
spec:
replicas: 2
selector:
matchLabels:
app: iksoon-pod-nginx
template:
metadata:
labels:
app: iksoon-pod-nginx
spec:
containers:
- name: iksoon-nginx
image: nginx:latest
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: iksoon-nginx-service
spec:
ports:
- port: 8081
targetPort: 80
selector:
app: iksoon-pod-nginx
deploy test client pod
apiVersion: v1
kind: Pod
metadata:
name: netshoot
labels:
app: netshoot
spec:
containers:
- name: netshoot
image: nicolaka/netshoot
imagePullPolicy: IfNotPresent
command: ["/bin/sleep"]
args: ["3650d"]
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
restartPolicy: Always
I attempted the following steps:
Requests every 0.5 seconds after connecting to netshoot
kubectl exec -it netshoot -- bash
while true ; do curl -s -o /dev/null -w "%{http_code}\n" iksoon-nginx-service:8081/ ; sleep 0.5 ; done
coreDNS rollout
kubectl rollout restart deployment/coredns -n kube-system (move CoreDNS to a different node)
result : OK (200 return)
test server pod rollout
kubectl rollout restart deployment.apps/iksoon-deployment-nginx
result : OK (200 return)
I’ve tried multiple times, but the results are always normal. Has the issue with https://github.com/projectcalico/calico/issues/4509 been resolved? Or did I perform the test incorrectly?
How can I reproduce the issue you mentioned earlier?
It depends on how the application uses the DNS. If the application uses connect()
with udp, one backend is picked up for the live of the socket. If the app never creates a new socket after it stopped getting response to pick a new backend, it gets stuck.
In most cases it is not an issue, but in some situations it is, so our default is to avoid the "some". Many deployments used to live without that happily before it got reported. You may be the lucky one.
Deploy a Pod with the following options in a Calico cluster:
Calico Setting
Pod spec
Run the "
nslookup kubernetes
" command on that podIt has been confirmed that there is an issue with domain queries not being made using coreDNS.
This is the pod yaml used for testing.
After deploying the pod, I ran the following command.
The execution results are as follows.
Expected Behavior
I want the command to execute success.
Current Behavior
The test environment is as follows:
pod deployment info
worker node info
"worker node" nat tables
kubectl exec -it calico-node-76h79 -n kube-system -- calico-node -bpf nat dump --log-level debug
"worker node" routing table
tcpdump log
Let's check the packet below.
The problem situation can be expressed in a diagram as follows:
Let's check the BPF log for all sections.
1. Sending packets from host to coreDNS
Final result=ALLOW
Changed by NAT as follows:
It is OK
2. The packet arrived at the coreDNS pod
Final result=ALLOW
The coreDNS pod received the packet. it's OK
3. coreDNS sends a response. It forwards the packet to host.
Final result=ALLOW
Changed by NAT as follows:
In my opinion, This is also okay...
4. This is an error section, There is no log confirming arrival to eth0.
An ICMP log is confirmed out of nowhere....????????
Possible Solution
There is no problem with calico
v3.24.1
. Just change the version to "v3.24.1" in the same cluster, it works fine.Steps to Reproduce (for bugs)
Context
k8s cluster networking error
Your Environment