kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.97k stars 450 forks source link

[BUG] LoadBalancer Service TCP Connection Fails in Cilium Kube-Proxy Replacement Mode #4739

Open MuNeNICK opened 2 hours ago

MuNeNICK commented 2 hours ago

Kube-OVN Version

v1.12.28

Cilium Version

1.16.3

Kubernetes Version

Client Version: v1.31.2+k3s1 Kustomize Version: v5.4.2 Server Version: v1.31.2+k3s1

Operation-system/Kernel Version

"Ubuntu 22.04.5 LTS" 5.15.0-125-generic

Description

Hello.

I want to use Kube-OVN's LoadBalancer in a Cilium Kube-Proxy Replacement environment.

The LoadBalancer type service I created as shown below is not working:

ubuntu@ubuntu:~/test-svc$ kubectl get pod -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP           NODE     NOMINATED NODE   READINESS GATES
lb-svc-podinfo-576656f7b-85hvd   1/1     Running   0          3m44s   10.16.0.11   ubuntu   <none>           <none>
podinfo-6bd97dfb99-rj4g6         1/1     Running   0          3m55s   10.16.0.9    ubuntu   <none>           <none>
podinfo-6bd97dfb99-t6rxd         1/1     Running   0          3m55s   10.16.0.10   ubuntu   <none>           <none>
ubuntu@ubuntu:~/test-svc$ kubectl get svc
NAME         TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)          AGE
kubernetes   ClusterIP      10.43.0.1     <none>         443/TCP          30m
podinfo      LoadBalancer   10.43.5.219   192.168.0.62   9898:31442/TCP   3m49s
ubuntu@ubuntu:~/test-svc$

Ping works, but curl doesn't:

C:\Users\mune0>ping 192.168.0.61

Pinging 192.168.0.61 with 32 bytes of data:
Reply from 192.168.0.61: bytes=32 time<1ms TTL=64
Reply from 192.168.0.61: bytes=32 time<1ms TTL=64
Reply from 192.168.0.61: bytes=32 time<1ms TTL=64
Reply from 192.168.0.61: bytes=32 time<1ms TTL=64

Ping statistics for 192.168.0.61:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms

C:\Users\mune0>curl 192.168.0.61:9898
curl: (28) Failed to connect to 192.168.0.61 port 9898 after 21003 ms: Could not connect to server

Here are the tcpdump results on the Node:

ubuntu@ubuntu:~/test-svc$ sudo tcpdump 'port 9898'
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:41:09.992512 IP 192.168.0.126.65498 > 192.168.0.62.9898: Flags [S], seq 3667285225, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:41:09.992563 IP 192.168.0.126.65498 > 10.16.0.9.9898: Flags [S], seq 3667285225, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:41:10.992850 IP 192.168.0.126.65498 > 192.168.0.62.9898: Flags [S], seq 3667285225, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:41:10.992916 IP 192.168.0.126.65498 > 10.16.0.9.9898: Flags [S], seq 3667285225, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:41:12.993139 IP 192.168.0.126.65498 > 192.168.0.62.9898: Flags [S], seq 3667285225, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:41:12.993209 IP 192.168.0.126.65498 > 10.16.0.9.9898: Flags [S], seq 3667285225, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:41:16.993475 IP 192.168.0.126.65498 > 192.168.0.62.9898: Flags [S], seq 3667285225, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:41:16.993545 IP 192.168.0.126.65498 > 10.16.0.9.9898: Flags [S], seq 3667285225, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:41:24.993406 IP 192.168.0.126.65498 > 192.168.0.62.9898: Flags [S], seq 3667285225, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
16:41:24.993469 IP 192.168.0.126.65498 > 10.16.0.9.9898: Flags [S], seq 3667285225, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel
ubuntu@ubuntu:~/test-svc$

Here are the Cilium dbg results:

ubuntu@ubuntu:~/test-svc$ kubectl -n kube-system exec ds/cilium -- cilium-dbg service list
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
ID   Frontend              Service Type   Backend                            
1    10.43.0.1:443         ClusterIP      1 => 192.168.0.28:6443 (active)    
2    10.43.168.220:443     ClusterIP      1 => 192.168.0.28:4244 (active)    
3    10.43.0.10:53         ClusterIP      1 => 10.16.0.2:53 (active)         
4    10.43.0.10:9153       ClusterIP      1 => 10.16.0.2:9153 (active)       
5    10.43.228.178:6642    ClusterIP      1 => 192.168.0.28:6642 (active)    
6    10.43.114.93:10660    ClusterIP      1 => 192.168.0.28:10660 (active)   
7    10.43.180.141:6641    ClusterIP      1 => 192.168.0.28:6641 (active)    
8    10.43.140.229:6643    ClusterIP      1 => 192.168.0.28:6643 (active)    
9    10.43.217.128:10665   ClusterIP      1 => 192.168.0.28:10665 (active)   
10   10.43.32.223:10661    ClusterIP      1 => 192.168.0.28:10661 (active)   
11   10.43.182.198:8080    ClusterIP      1 => 10.16.0.3:8080 (active)       
12   10.43.159.239:9898    ClusterIP      1 => 10.0.1.3:9898 (active)        
                                          2 => 10.0.1.2:9898 (active)        
13   192.168.0.28:31609    NodePort       1 => 10.0.1.3:9898 (active)        
                                          2 => 10.0.1.2:9898 (active)        
14   0.0.0.0:31609         NodePort       1 => 10.0.1.3:9898 (active)        
                                          2 => 10.0.1.2:9898 (active)        
15   192.168.0.61:9898     LoadBalancer   1 => 10.0.1.3:9898 (active)        
                                          2 => 10.0.1.2:9898 (active)        
19   10.43.92.92:80        ClusterIP      1 => 10.16.0.7:4245 (active)       
20   10.43.112.220:80      ClusterIP      1 => 10.16.0.8:8081 (active)       
21   10.43.5.219:9898      ClusterIP      1 => 10.16.0.10:9898 (active)      
                                          2 => 10.16.0.9:9898 (active)       
22   192.168.0.28:31442    NodePort       1 => 10.16.0.10:9898 (active)      
                                          2 => 10.16.0.9:9898 (active)       
23   0.0.0.0:31442         NodePort       1 => 10.16.0.10:9898 (active)      
                                          2 => 10.16.0.9:9898 (active)       
24   192.168.0.62:9898     LoadBalancer   1 => 10.16.0.10:9898 (active)      
                                          2 => 10.16.0.9:9898 (active)                                   

Steps To Reproduce

You can expand the following to check the environment setup commands.

Enviroment Setup Commands Here k3s ``` export INSTALL_K3S_VERSION=v1.31.2+k3s1 curl -sfL https://get.k3s.io | sh -s - \ --disable=servicelb \ --disable=traefik \ --disable=metrics-server \ --flannel-backend=none \ --disable-kube-proxy \ --disable-network-policy \ --disable-helm-controller \ --disable-cloud-controller \ --write-kubeconfig-mode 644 \ --write-kubeconfig ~/.kube/config ``` cilium ``` cat << 'EOF' | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: cni-configuration namespace: kube-system data: cni-config: |- { "name": "generic-veth", "cniVersion": "0.3.1", "plugins": [ { "type": "kube-ovn", "server_socket": "/run/openvswitch/kube-ovn-daemon.sock", "ipam": { "type": "kube-ovn", "server_socket": "/run/openvswitch/kube-ovn-daemon.sock" } }, { "type": "portmap", "snat": true, "capabilities": {"portMappings": true} }, { "type": "cilium-cni" } ] } EOF ``` ``` helm repo add cilium https://helm.cilium.io/ helm install cilium cilium/cilium \ --namespace kube-system \ --set operator.replicas=1 \ --set k8sServiceHost=${SERVER_IP} \ --set k8sServicePort=6443 \ --set kubeProxyReplacement=true \ --set operator.replicas=1 \ --set socketLB.enabled=true \ --set nodePort.enabled=true \ --set externalIPs.enabled=true \ --set hostPort.enabled=false \ --set routingMode=native \ --set sessionAffinity=true \ --set enableIPv4Masquerade=false \ --set enableIPv6Masquerade=false \ --set hubble.enabled=true \ --set sctp.enabled=true \ --set ipv4.enabled=true \ --set ipv6.enabled=false \ --set ipam.mode=cluster-pool \ --set-json ipam.operator.clusterPoolIPv4PodCIDRList='["100.65.0.0/16"]' \ --set-json ipam.operator.clusterPoolIPv6PodCIDRList='["fd00:100:65::/112"]' \ --set cni.chainingMode=generic-veth \ --set cni.chainingTarget=kube-ovn \ --set cni.customConf=true \ --set cni.configMap=cni-configuration ``` kube-ovn ``` kubectl label node -lbeta.kubernetes.io/os=linux kubernetes.io/os=linux --overwrite kubectl label node -lnode-role.kubernetes.io/control-plane kube-ovn/role=master --overwrite kubectl label node -lovn.kubernetes.io/ovs_dp_type!=userspace ovn.kubernetes.io/ovs_dp_type=kernel --overwrite helm repo add kubeovn https://kubeovn.github.io/kube-ovn/ helm install kube-ovn kubeovn/kube-ovn \ --set MASTER_NODES=${SERVER_IP} \ --set func.ENABLE_NP=false \ --set func.ENABLE_LB_SVC=true \ --set func.ENABLE_TPROXY=true \ --set cni_conf.CNI_CONFIG_PRIORITY=10 ``` multus ``` kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset.yml ``` external-network.yaml ``` apiVersion: kubeovn.io/v1 kind: Subnet metadata: name: ovn-vpc-external-network spec: protocol: IPv4 provider: ovn-vpc-external-network.kube-system cidrBlock: 192.168.0.0/24 gateway: 192.168.0.1 # IP address of the physical gateway excludeIps: - 192.168.0.1..192.168.0.60 - 192.168.0.80..192.168.0.254 --- apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ovn-vpc-external-network namespace: kube-system spec: config: '{ "cniVersion": "0.3.0", "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type": "kube-ovn", "server_socket": "/run/openvswitch/kube-ovn-daemon.sock", "provider": "ovn-vpc-external-network.kube-system" } }' ``` deploymant.yaml ``` apiVersion: apps/v1 kind: Deployment metadata: name: podinfo # namespace: vpc1 labels: app: podinfo spec: replicas: 2 selector: matchLabels: app: podinfo template: metadata: labels: app: podinfo spec: containers: - name: podinfo image: ghcr.io/stefanprodan/podinfo:6.1.5 ports: - containerPort: 9898 env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace ``` svc.yaml ``` apiVersion: v1 kind: Service metadata: name: podinfo # namespace: vpc1 annotations: lb-svc-attachment.kube-system.kubernetes.io/logical_switch: ovn-vpc-external-network ovn.kubernetes.io/attachmentprovider: ovn-vpc-external-network.kube-system labels: app: podinfo spec: type: LoadBalancer ports: - port: 9898 targetPort: 9898 protocol: TCP selector: app: podinfo ```

Current Behavior

The current system behavior shows several issues:

  1. ICMP (ping) works to the LoadBalancer IP (192.168.0.62), but TCP connection (curl) fails:

    • Ping to 192.168.0.62 succeeds
    • Curl to 192.168.0.62:9898 times out after 21 seconds
  2. From tcpdump analysis:

    • SYN packets are being sent to both LoadBalancer IP (192.168.0.62:9898)
    • Traffic is being forwarded to the backend pod IP (10.16.0.9:9898)
    • No responses (SYN-ACK) are observed from either destination
  3. Cilium service list shows:

    • LoadBalancer service is properly configured (ID 24)
    • Frontend IP is 192.168.0.62:9898
    • Two active backends: 10.16.0.10:9898 and 10.16.0.9:9898
    • NodePort services are also configured on port 31442

Expected Behavior

  1. LoadBalancer Service should be fully functional:

    • TCP traffic to LoadBalancer IP (192.168.0.62:9898) should successfully reach the backend pods
    • Curl requests to 192.168.0.62:9898 should receive responses from the podinfo service
    • Service should properly load balance traffic between both backend pods (10.16.0.9 and 10.16.0.10)
  2. Network flow should work as follows:

    • Client sends request to LoadBalancer IP
    • Request is properly routed through Cilium's load balancing
    • Backend pod receives the traffic and responds
    • Response packets successfully return to the client
  3. Both Layer 3 (IP) and Layer 4 (TCP) connectivity should work:

    • Ping (ICMP) continues to work as it currently does
    • TCP connections should establish successfully
    • Full TCP handshake (SYN, SYN-ACK, ACK) should complete
  4. The LoadBalancer service should provide the same functionality as if using the default kube-proxy, despite running in Cilium's kube-proxy replacement mode.

dosubot[bot] commented 2 hours ago

Hey there, @MuNeNICK! I'm here to help you with any issues or questions you have. Let's work together to solve this bug!

The issue you're experiencing with the LoadBalancer service in a Cilium Kube-Proxy Replacement environment might be related to the integration between Kube-OVN and Cilium, especially in handling TCP connections. Here are some steps and considerations to help troubleshoot and potentially resolve the problem:

  1. Check Kube-OVN and Cilium Compatibility: Ensure that the versions of Kube-OVN and Cilium you are using are compatible, especially with the kube-proxy replacement feature. There might be specific configurations or annotations required for them to work together, which are not detailed in the available documentation.

  2. Review LoadBalancer Configuration: Double-check the annotations and configurations in your svc.yaml to ensure they are correctly set for the environment. The annotations related to logical switch and attachment provider should match the network setup.

  3. Inspect Network Policies: If there are any network policies applied, ensure they allow traffic between the LoadBalancer IP and the backend pods. Network policies might inadvertently block the TCP traffic.

  4. Use Kube-OVN Debugging Tools: Utilize the debugging tools provided by Kube-OVN to trace the network flow and identify where the packets might be getting dropped. You can use kubectl-ko commands to gather more insights:

    • kubectl ko trace <pod> <args...> to trace packets through the network.
    • kubectl ko log kube-ovn all to collect logs from all Kube-OVN components [1][2].
  5. Check Cilium Logs and Debugging: Use Cilium's debugging tools to ensure that the service is correctly configured and that there are no issues with the load balancing or packet forwarding.

  6. Verify DNAT Rules: Ensure that the DNAT rules are correctly set up in Kube-OVN for the LoadBalancer service. The rules should properly translate the external IP to the internal pod IPs [3].

If these steps do not resolve the issue, further investigation into the specific integration between Kube-OVN and Cilium might be necessary, as there could be undocumented compatibility issues or additional configuration requirements.

To continue talking to Dosu, mention @dosu.

MuNeNICK commented 2 hours ago

The LoadBalancer worked correctly in the integrated Kube-OVN and Cilium environment when not using Kube-Proxy Replacement mode, as shown in the following procedure.

https://memos.munenick.me/m/fvXeKkZqADtD9HxSctBMDN#comments