Closed dyrnq closed 5 months ago
2024-06-19 02:12:07.863 [DEBUG][13926] felix/health.go 331: Calculated health summary: live=true ready=false
+---------------------------+---------+----------------+---------------------+-----------------+
| COMPONENT | TIMEOUT | LIVENESS | READINESS | DETAIL |
+---------------------------+---------+----------------+---------------------+-----------------+
| BPFEndpointManager | - | - | reporting non-ready | Not yet synced. |
| CalculationGraph | 30s | reporting live | reporting ready | |
| FelixStartup | - | reporting live | reporting ready | |
| InternalDataplaneMainLoop | 1m30s | reporting live | reporting non-ready | |
+---------------------------+---------+----------------+---------------------+-----------------+
2024-06-19 02:12:07.863 [INFO][13926] felix/health.go 336: Overall health status changed: live=true ready=false
+---------------------------+---------+----------------+---------------------+-----------------+
| COMPONENT | TIMEOUT | LIVENESS | READINESS | DETAIL |
+---------------------------+---------+----------------+---------------------+-----------------+
| BPFEndpointManager | - | - | reporting non-ready | Not yet synced. |
| CalculationGraph | 30s | reporting live | reporting ready | |
| FelixStartup | - | reporting live | reporting ready | |
| InternalDataplaneMainLoop | 1m30s | reporting live | reporting non-ready | |
+---------------------------+---------+----------------+---------------------+-----------------+
ss -tunlp |grep bird
tcp LISTEN 0 8 0.0.0.0:179 0.0.0.0:* users:(("bird",pid=14049,fd=7))
curl http://localhost:9099/liveness
+---------------------------+---------+----------------+---------------------+-----------------+
| COMPONENT | TIMEOUT | LIVENESS | READINESS | DETAIL |
+---------------------------+---------+----------------+---------------------+-----------------+
| BPFEndpointManager | - | - | reporting non-ready | Not yet synced. |
| CalculationGraph | 30s | reporting live | reporting ready | |
| FelixStartup | - | reporting live | reporting ready | |
| InternalDataplaneMainLoop | 1m30s | reporting live | reporting non-ready | |
+---------------------------+---------+----------------+---------------------+-----------------+
kubectl get ippool default-ipv4-ippool -o yaml
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
annotations:
projectcalico.org/metadata: '{"creationTimestamp":"2024-06-19T01:57:00Z"}'
creationTimestamp: "2024-06-19T01:57:00Z"
generation: 1
name: default-ipv4-ippool
resourceVersion: "989"
uid: e6779b53-a976-43cc-999e-52eef4e3b026
spec:
allowedUses:
- Workload
- Tunnel
blockSize: 26
cidr: 10.244.0.0/16
ipipMode: Never
natOutgoing: true
nodeSelector: all()
vxlanMode: Never
I see that kube-proxy is running. It needs to be disabled. Can you confirm if you followed the steps in https://docs.tigera.io/calico/latest/operations/ebpf/enabling-ebpf
Felix in node is restarting a lot due to kube-proxy in ipvs mode.
2024-06-19 02:12:07.689 [INFO][13926] felix/int_dataplane.go 1347: kube-proxy mode changed. Restart felix. ipvsIfaceState="down" ipvsSupport=false
Pls,make sure that you follow all steps when enabling ebpf including setting the config map etc.
Felix in node is restarting a lot due to kube-proxy in ipvs mode.
2024-06-19 02:12:07.689 [INFO][13926] felix/int_dataplane.go 1347: kube-proxy mode changed. Restart felix. ipvsIfaceState="down" ipvsSupport=false
Pls,make sure that you follow all steps when enabling ebpf including setting the config map etc.
@sridhartigera @tomastigera TKS, it`s working fine
after steps below I
kubectl patch ds -n kube-system kube-proxy -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'
II AND must reboot node
At the beginning, I only executed kubectl patch ds -n kube-system kube-proxy -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'
but did not reboot the node
Maybe there is another better way(non-restart)?
working result
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/master1 Ready control-plane 21h v1.30.1 192.168.55.111 <none> Ubuntu 20.04.6 LTS 5.4.0-186-generic containerd://1.6.33
node/master2 Ready control-plane 21h v1.30.1 192.168.55.112 <none> Ubuntu 20.04.6 LTS 5.4.0-186-generic containerd://1.6.33
node/master3 Ready control-plane 21h v1.30.1 192.168.55.113 <none> Ubuntu 20.04.6 LTS 5.4.0-186-generic containerd://1.6.33
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-nginx pod/ingress-nginx-controller-6f848d7788-dg7mp 1/1 Running 1 (9m ago) 18h 10.244.137.92 master1 <none> <none>
ingress-nginx pod/ingress-nginx-controller-6f848d7788-z5gz7 1/1 Running 1 (9m ago) 18h 10.244.137.90 master1 <none> <none>
kube-system pod/calico-kube-controllers-564985c589-krlm5 1/1 Running 18 (11m ago) 18h 10.244.180.5 master2 <none> <none>
kube-system pod/calico-node-9vpng 1/1 Running 20 (8m39s ago) 21h 192.168.55.113 master3 <none> <none>
kube-system pod/calico-node-h6z47 1/1 Running 26 (11m ago) 21h 192.168.55.112 master2 <none> <none>
kube-system pod/calico-node-pltbx 1/1 Running 8 (9m ago) 21h 192.168.55.111 master1 <none> <none>
kube-system pod/coredns-7db6d8ff4d-c26pp 1/1 Running 3 (11m ago) 18h 10.244.180.4 master2 <none> <none>
kube-system pod/coredns-7db6d8ff4d-zqmwt 1/1 Running 4 (8m39s ago) 18h 10.244.136.5 master3 <none> <none>
kube-system pod/kube-apiserver-master1 1/1 Running 8 (9m ago) 21h 192.168.55.111 master1 <none> <none>
kube-system pod/kube-apiserver-master2 1/1 Running 17 (11m ago) 21h 192.168.55.112 master2 <none> <none>
kube-system pod/kube-apiserver-master3 1/1 Running 12 (8m39s ago) 21h 192.168.55.113 master3 <none> <none>
kube-system pod/kube-controller-manager-master1 1/1 Running 6 (9m ago) 21h 192.168.55.111 master1 <none> <none>
kube-system pod/kube-controller-manager-master2 1/1 Running 30 (11m ago) 21h 192.168.55.112 master2 <none> <none>
kube-system pod/kube-controller-manager-master3 1/1 Running 29 (8m39s ago) 21h 192.168.55.113 master3 <none> <none>
kube-system pod/kube-scheduler-master1 1/1 Running 6 (9m ago) 21h 192.168.55.111 master1 <none> <none>
kube-system pod/kube-scheduler-master2 1/1 Running 28 (11m ago) 21h 192.168.55.112 master2 <none> <none>
kube-system pod/kube-scheduler-master3 1/1 Running 27 (8m39s ago) 21h 192.168.55.113 master3 <none> <none>
kube-system pod/kubelet-csr-approver-6df44c648f-kkhn7 1/1 Running 20 (8m39s ago) 18h 10.244.136.6 master3 <none> <none>
kube-system pod/metrics-server-758fd799ff-p6txd 1/1 Running 2 (8m39s ago) 18h 10.244.137.91 master1 <none> <none>
kubernetes-dashboard pod/dashboard-metrics-scraper-795895d745-vfcgj 1/1 Running 1 (9m ago) 18h 10.244.137.93 master1 <none> <none>
kubernetes-dashboard pod/kubernetes-dashboard-697d5b47c4-p9hz6 1/1 Running 2 (8m14s ago) 18h 10.244.137.89 master1 <none> <none>
curl http://localhost:9099/liveness
+---------------------------+---------+----------------+-----------------+--------+
| COMPONENT | TIMEOUT | LIVENESS | READINESS | DETAIL |
+---------------------------+---------+----------------+-----------------+--------+
| BPFEndpointManager | - | - | reporting ready | |
| CalculationGraph | 30s | reporting live | reporting ready | |
| FelixStartup | - | reporting live | reporting ready | |
| InternalDataplaneMainLoop | 1m30s | reporting live | reporting ready | |
+---------------------------+---------+----------------+-----------------+--------+
I also tried
I
kubectl -n kube-system delete ds kube-proxy
and kubectl -n kube-system delete cm kube-proxy
II
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.28.0/manifests/calico-bpf.yaml
The result is the same,Maybe MUST init cluster with kubeadm init --skip-phases=addon/kube-proxy
?
curl http://localhost:9099/liveness
+---------------------------+---------+----------------+---------------------+-----------------+
| COMPONENT | TIMEOUT | LIVENESS | READINESS | DETAIL |
+---------------------------+---------+----------------+---------------------+-----------------+
| BPFEndpointManager | - | - | reporting non-ready | Not yet synced. |
| CalculationGraph | 30s | reporting live | reporting ready | |
| FelixStartup | - | reporting live | reporting ready | |
| InternalDataplaneMainLoop | 1m30s | reporting live | reporting non-ready | |
+---------------------------+---------+----------------+---------------------+-----------------+
finally
I tried kubeadm init --skip-phases=addon/kube-proxy
,Indeed it is the most perfect, everything goes well
finally I found the way without reboot
ip link delete kube-ipvs0 :)
were you running kube-proxy in ipvs mode before switching to ebpf? If so, we don't support ipvs mode-> ebpf. Better to switch to iptables mode before moving to ebpf.
were you running kube-proxy in ipvs mode before switching to ebpf? If so, we don't support ipvs mode-> ebpf. Better to switch to iptables mode before moving to ebpf.
Got it, thanks for the reply
Closing this issue.
finally I found the way without reboot
ip link delete kube-ipvs0 :)
Thanks!!! I switched from kube-proxy with ipvs (kubekey deployed) to calico ebpf, and all of calico was going nuts, until I ran ip link delete kube-ipvs0
on each node! Reboot would also do it, true
Expected Behavior
Current Behavior
Possible Solution
Steps to Reproduce (for bugs)
Context
Your Environment
Calico version v1.28.0
Orchestrator version (e.g. kubernetes, mesos, rkt):kubernetes(1.30.1)
Operating System and version:
Linux master1 5.4.0-173-generic #191-Ubuntu SMP Fri Feb 2 13:55:07 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Link to your project (optional):