pod startup network not ready, always delay few milliseconds

buffge commented 2 weeks ago

Expected Behavior

when pod startup, connect to cluster ip should ok. but now after a little ms the network being ok.

Current Behavior

Within a few milliseconds after pod startup, all packets sent out will not receive a response.

Possible Solution

l guess at pod startup time,something is not ready

Steps to Reproduce (for bugs)

kubectl run --restart='Never' --rm -it nettest --image=busybox:1.36-musl -- sh -c ' echo -e "cluster address:\n" echo -e "\n"| time telnet mysql.default.svc.cluster.local 3306 # delay 1s, real time is 1.00x second echo -e "\n"| time telnet mysql.default.svc.cluster.local 3306 # no delay, real time is 0.00x second ' kubectl run --restart='Never' --rm -it nettest --image=busybox:1.36-musl -- sh -c ' echo -e "cluster service ip:\n" echo -e "\n"| time telnet 10.105.46.176 3306 # delay 1s, real time is 1.00x second echo -e "\n"| time telnet 10.105.46.176 3306 # no delay, real time is 0.00x second ' kubectl run --restart='Never' --rm -it nettest --image=busybox:1.36-musl -- sh -c ' echo -e "cluster pod ip:\n" echo -e "\n"| time telnet 10.234.183.16 3306 # delay 1s, real time is 1.00x second echo -e "\n"| time telnet 10.234.183.16 3306 # no delay, real time is 0.00x second ' kubectl run --restart='Never' --rm -it nettest --image=busybox:1.36-musl -- sh -c ' echo -e "cluster node lan ip:\n" echo -e "\n"| time telnet 192.168.1.6 6443 # no delay, real time is 0.00x second echo -e "\n"| time telnet 192.168.1.6 6443 # no delay, real time is 0.00x second ' kubectl run --restart='Never' --rm -it nettest --image=busybox:1.36-musl -- sh -c ' echo -e "wan ip:\n" echo -e "\n"| time telnet 1.1.1.1 80 # delay 1s, real time is 1.0xx second echo -e "\n"| time telnet 1.1.1.1 80 # no delay, real time is 0.0xx second '

Context

I haven't had this problem for the past few months. Yesterday, there was a DNS issue with the pod, which often caused a 5-second DNS delay. I solved this problem by installing LocalDNS Then came this new problem, I upgraded Calico 3.27.3 to 3.28.1 but still couldn't solve it

Your Environment

Calico version 3.27.3 and 3.28.1
Orchestrator version: kubernetes v1.29.0

Operating System and version: ubuntu 22.04, kernal 5.15.0-119-generic tigera config :

installation:
cni:
type: Calico
calicoNetwork:
bgp: Disabled
linuxDataplane: BPF
ipPools:
  - cidr: 10.234.0.0/16
    encapsulation: VXLAN
nodeAddressAutodetectionV4:
  kubernetes: NodeInternalIP
apiServer:
enabled: true

k8s proxy mode is ipvs

tomastigera commented 2 weeks ago

k8s proxy mode is ipvs

We do not support switching to ebpf mode from ipvs. You first need to turn your kubeproxy to iptables mode and then disable it and switch to ebpf.

https://docs.tigera.io/calico/3.28/operations/ebpf/install#disable-kube-proxy-or-avoid-conflicts

buffge commented 2 weeks ago

k8s代理模式是ipvs

我们不支持从 ipvs 切换到 ebpf 模式。您需要先将 kubeproxy 转为 iptables 模式，然后禁用它并切换到 ebpf。

https://docs.tigera.io/calico/3.28/operations/ebpf/install#disable-kube-proxy-or-avoid-conflicts

l has do that the problem still have

caseydavenport commented 1 week ago

k8s proxy mode is ipvs

Like @tomastigera said, we don't support Calico in BPF mode with kube-proxy. You should remove kube-proxy and instead use Calico's built-in eBPF Service implementation.

buffge commented 4 days ago

k8s proxy mode is ipvs

Like @tomastigera said, we don't support Calico in BPF mode with kube-proxy. You should remove kube-proxy and instead use Calico's built-in eBPF Service implementation.

l removed kube-proxy but problem still have. Can you give me some suggestions for troubleshooting thinks

projectcalico / calico