projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.89k stars 1.31k forks source link

pod startup network not ready, always delay few milliseconds #9204

Open buffge opened 2 weeks ago

buffge commented 2 weeks ago



Expected Behavior

when pod startup, connect to cluster ip should ok. but now after a little ms the network being ok. 

Current Behavior

Within a few milliseconds after pod startup, all packets sent out will not receive a response.  

Possible Solution

l guess at pod startup time,something is not ready 

Steps to Reproduce (for bugs)

kubectl run --restart='Never' --rm -it nettest --image=busybox:1.36-musl -- sh -c ' echo -e "cluster address:\n" echo -e "\n"| time telnet mysql.default.svc.cluster.local 3306 # delay 1s, real time is 1.00x second echo -e "\n"| time telnet mysql.default.svc.cluster.local 3306 # no delay, real time is 0.00x second '  kubectl run --restart='Never' --rm -it nettest --image=busybox:1.36-musl -- sh -c ' echo -e "cluster service ip:\n" echo -e "\n"| time telnet 10.105.46.176 3306 # delay 1s, real time is 1.00x second echo -e "\n"| time telnet 10.105.46.176 3306 # no delay, real time is 0.00x second '  kubectl run --restart='Never' --rm -it nettest --image=busybox:1.36-musl -- sh -c ' echo -e "cluster pod ip:\n" echo -e "\n"| time telnet 10.234.183.16 3306 # delay 1s, real time is 1.00x second echo -e "\n"| time telnet 10.234.183.16 3306 # no delay, real time is 0.00x second '  kubectl run --restart='Never' --rm -it nettest --image=busybox:1.36-musl -- sh -c ' echo -e "cluster node lan ip:\n" echo -e "\n"| time telnet 192.168.1.6 6443 # no delay, real time is 0.00x second echo -e "\n"| time telnet 192.168.1.6 6443 # no delay, real time is 0.00x second '  kubectl run --restart='Never' --rm -it nettest --image=busybox:1.36-musl -- sh -c ' echo -e "wan ip:\n" echo -e "\n"| time telnet 1.1.1.1 80 # delay 1s, real time is 1.0xx second echo -e "\n"| time telnet 1.1.1.1 80 # no delay, real time is 0.0xx second ' 

Context

I haven't had this problem for the past few months. Yesterday, there was a DNS issue with the pod, which often caused a 5-second DNS delay. I solved this problem by installing LocalDNS Then came this new problem, I upgraded Calico 3.27.3 to 3.28.1 but still couldn't solve it 

Your Environment



tomastigera commented 2 weeks ago

k8s proxy mode is ipvs

We do not support switching to ebpf mode from ipvs. You first need to turn your kubeproxy to iptables mode and then disable it and switch to ebpf.

https://docs.tigera.io/calico/3.28/operations/ebpf/install#disable-kube-proxy-or-avoid-conflicts

buffge commented 2 weeks ago

k8s代理模式是ipvs

我们不支持从 ipvs 切换到 ebpf 模式。您需要先将 kubeproxy 转为 iptables 模式,然后禁用它并切换到 ebpf。

https://docs.tigera.io/calico/3.28/operations/ebpf/install#disable-kube-proxy-or-avoid-conflicts

l has do that the problem still have

caseydavenport commented 1 week ago

k8s proxy mode is ipvs

Like @tomastigera said, we don't support Calico in BPF mode with kube-proxy. You should remove kube-proxy and instead use Calico's built-in eBPF Service implementation.

buffge commented 4 days ago

k8s proxy mode is ipvs

Like @tomastigera said, we don't support Calico in BPF mode with kube-proxy. You should remove kube-proxy and instead use Calico's built-in eBPF Service implementation.

l removed kube-proxy but problem still have. Can you give me some suggestions for troubleshooting thinks