Open MuNeNICK opened 2 hours ago
Hey there, @MuNeNICK! I'm here to help you with any issues or questions you have. Let's work together to solve this bug!
The issue you're experiencing with the LoadBalancer service in a Cilium Kube-Proxy Replacement environment might be related to the integration between Kube-OVN and Cilium, especially in handling TCP connections. Here are some steps and considerations to help troubleshoot and potentially resolve the problem:
Check Kube-OVN and Cilium Compatibility: Ensure that the versions of Kube-OVN and Cilium you are using are compatible, especially with the kube-proxy replacement feature. There might be specific configurations or annotations required for them to work together, which are not detailed in the available documentation.
Review LoadBalancer Configuration: Double-check the annotations and configurations in your svc.yaml
to ensure they are correctly set for the environment. The annotations related to logical switch and attachment provider should match the network setup.
Inspect Network Policies: If there are any network policies applied, ensure they allow traffic between the LoadBalancer IP and the backend pods. Network policies might inadvertently block the TCP traffic.
Use Kube-OVN Debugging Tools: Utilize the debugging tools provided by Kube-OVN to trace the network flow and identify where the packets might be getting dropped. You can use kubectl-ko
commands to gather more insights:
Check Cilium Logs and Debugging: Use Cilium's debugging tools to ensure that the service is correctly configured and that there are no issues with the load balancing or packet forwarding.
Verify DNAT Rules: Ensure that the DNAT rules are correctly set up in Kube-OVN for the LoadBalancer service. The rules should properly translate the external IP to the internal pod IPs [3].
If these steps do not resolve the issue, further investigation into the specific integration between Kube-OVN and Cilium might be necessary, as there could be undocumented compatibility issues or additional configuration requirements.
To continue talking to Dosu, mention @dosu.
The LoadBalancer worked correctly in the integrated Kube-OVN and Cilium environment when not using Kube-Proxy Replacement mode, as shown in the following procedure.
Kube-OVN Version
v1.12.28
Cilium Version
1.16.3
Kubernetes Version
Client Version: v1.31.2+k3s1 Kustomize Version: v5.4.2 Server Version: v1.31.2+k3s1
Operation-system/Kernel Version
"Ubuntu 22.04.5 LTS" 5.15.0-125-generic
Description
Hello.
I want to use Kube-OVN's LoadBalancer in a Cilium Kube-Proxy Replacement environment.
The LoadBalancer type service I created as shown below is not working:
Ping works, but curl doesn't:
Here are the tcpdump results on the Node:
Here are the Cilium dbg results:
Steps To Reproduce
You can expand the following to check the environment setup commands.
Enviroment Setup Commands Here
k3s ``` export INSTALL_K3S_VERSION=v1.31.2+k3s1 curl -sfL https://get.k3s.io | sh -s - \ --disable=servicelb \ --disable=traefik \ --disable=metrics-server \ --flannel-backend=none \ --disable-kube-proxy \ --disable-network-policy \ --disable-helm-controller \ --disable-cloud-controller \ --write-kubeconfig-mode 644 \ --write-kubeconfig ~/.kube/config ``` cilium ``` cat << 'EOF' | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: cni-configuration namespace: kube-system data: cni-config: |- { "name": "generic-veth", "cniVersion": "0.3.1", "plugins": [ { "type": "kube-ovn", "server_socket": "/run/openvswitch/kube-ovn-daemon.sock", "ipam": { "type": "kube-ovn", "server_socket": "/run/openvswitch/kube-ovn-daemon.sock" } }, { "type": "portmap", "snat": true, "capabilities": {"portMappings": true} }, { "type": "cilium-cni" } ] } EOF ``` ``` helm repo add cilium https://helm.cilium.io/ helm install cilium cilium/cilium \ --namespace kube-system \ --set operator.replicas=1 \ --set k8sServiceHost=${SERVER_IP} \ --set k8sServicePort=6443 \ --set kubeProxyReplacement=true \ --set operator.replicas=1 \ --set socketLB.enabled=true \ --set nodePort.enabled=true \ --set externalIPs.enabled=true \ --set hostPort.enabled=false \ --set routingMode=native \ --set sessionAffinity=true \ --set enableIPv4Masquerade=false \ --set enableIPv6Masquerade=false \ --set hubble.enabled=true \ --set sctp.enabled=true \ --set ipv4.enabled=true \ --set ipv6.enabled=false \ --set ipam.mode=cluster-pool \ --set-json ipam.operator.clusterPoolIPv4PodCIDRList='["100.65.0.0/16"]' \ --set-json ipam.operator.clusterPoolIPv6PodCIDRList='["fd00:100:65::/112"]' \ --set cni.chainingMode=generic-veth \ --set cni.chainingTarget=kube-ovn \ --set cni.customConf=true \ --set cni.configMap=cni-configuration ``` kube-ovn ``` kubectl label node -lbeta.kubernetes.io/os=linux kubernetes.io/os=linux --overwrite kubectl label node -lnode-role.kubernetes.io/control-plane kube-ovn/role=master --overwrite kubectl label node -lovn.kubernetes.io/ovs_dp_type!=userspace ovn.kubernetes.io/ovs_dp_type=kernel --overwrite helm repo add kubeovn https://kubeovn.github.io/kube-ovn/ helm install kube-ovn kubeovn/kube-ovn \ --set MASTER_NODES=${SERVER_IP} \ --set func.ENABLE_NP=false \ --set func.ENABLE_LB_SVC=true \ --set func.ENABLE_TPROXY=true \ --set cni_conf.CNI_CONFIG_PRIORITY=10 ``` multus ``` kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset.yml ``` external-network.yaml ``` apiVersion: kubeovn.io/v1 kind: Subnet metadata: name: ovn-vpc-external-network spec: protocol: IPv4 provider: ovn-vpc-external-network.kube-system cidrBlock: 192.168.0.0/24 gateway: 192.168.0.1 # IP address of the physical gateway excludeIps: - 192.168.0.1..192.168.0.60 - 192.168.0.80..192.168.0.254 --- apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ovn-vpc-external-network namespace: kube-system spec: config: '{ "cniVersion": "0.3.0", "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type": "kube-ovn", "server_socket": "/run/openvswitch/kube-ovn-daemon.sock", "provider": "ovn-vpc-external-network.kube-system" } }' ``` deploymant.yaml ``` apiVersion: apps/v1 kind: Deployment metadata: name: podinfo # namespace: vpc1 labels: app: podinfo spec: replicas: 2 selector: matchLabels: app: podinfo template: metadata: labels: app: podinfo spec: containers: - name: podinfo image: ghcr.io/stefanprodan/podinfo:6.1.5 ports: - containerPort: 9898 env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace ``` svc.yaml ``` apiVersion: v1 kind: Service metadata: name: podinfo # namespace: vpc1 annotations: lb-svc-attachment.kube-system.kubernetes.io/logical_switch: ovn-vpc-external-network ovn.kubernetes.io/attachmentprovider: ovn-vpc-external-network.kube-system labels: app: podinfo spec: type: LoadBalancer ports: - port: 9898 targetPort: 9898 protocol: TCP selector: app: podinfo ```Current Behavior
The current system behavior shows several issues:
ICMP (ping) works to the LoadBalancer IP (192.168.0.62), but TCP connection (curl) fails:
From tcpdump analysis:
Cilium service list shows:
Expected Behavior
LoadBalancer Service should be fully functional:
Network flow should work as follows:
Both Layer 3 (IP) and Layer 4 (TCP) connectivity should work:
The LoadBalancer service should provide the same functionality as if using the default kube-proxy, despite running in Cilium's kube-proxy replacement mode.