Closed xander-sh closed 10 months ago
But calico-node process in OS listens this port
That is correctl, calico-node serves as the kube-proxy.
service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng01-7df5975f6dxhxdxx-2gkhr" service="ingress-nginx/ingress-nginx-controller" port=32074
Do you have some more logs around this line? Does this happen after configuration changed? Do I understand correctly that if you restart the calico-node pod, this issue is resolved? Do you get a correct response from health check then?
There seems to be an issue that we do not release the port when we restart our kube-proxy (fast enough) and after restarting the proxy (which happens when config changes or when a node gets external IP etc.) we may not listen to that port again and you get response from the old process (hypothesis)
service use externalTrafficPolicy: Local to see real ip in incoming packets.
Note that using nodeports with calico in ebpf mode you would always see the clients IP (unlike when using regular k8s). So unless you want to prevent forwarding to another node due to a perf reason, you do not need to use Local
Do you have some more logs around this line? Does this happen after configuration changed? Do I understand correctly that if you restart the calico-node pod, this issue is resolved? Do you get a correct response from health check then?
I upload logs from calico-node pod after restart vm. And this log contain information about the workload restart around 06:12:39 UTC
or 09:12:39 local time
And paste logs around error on start up
> grep "listen tcp :32074: bind: address already in use" -A5 -B5 ~/tmp/calico_issue_8112/3.log.20231013-091314
2023-10-13T09:07:04.01579622+03:00 stdout F 2023-10-13 06:07:04.002 [DEBUG][46] felix/attach.go 78: Couldn't get the attached TC program ID. attachPoint=&tc.AttachPoint{Type:"host", ToOrFrom:"from", Hook:0, Iface:"vxlan.calico", LogLevel:"off", HostIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xdc, 0x1, 0x7}, HostTunnelIP:net.IP{0
x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xde, 0x14, 0x80}, IntfIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xde, 0x14, 0x80}, FIB:true, ToHostDrop:false, DSR:true, DSROptoutCIDRs:false, TunnelMTU:0x5aa, VXLANPort:0x12b5, WgPort:0xca6c, ExtToServiceConnmark:0x0, PSNATStart:0x4e20, PSNATEnd:0x752f, IPv6Ena
bled:false, RPFEnforceOption:0x2, NATin:0x0, NATout:0x0} error=Couldn't find 'id <ID> in 'tc filter' command out=
2023-10-13T09:07:04.015798999+03:00 stdout F err=no TC program attached
2023-10-13T09:07:04.015802667+03:00 stdout F 2023-10-13 06:07:04.002 [DEBUG][46] felix/attach.go 157: Continue with attaching BPF program from_hep_fib_dsr_no_log_co-re.o attachPoint=&tc.AttachPoint{Type:"host", ToOrFrom:"from", Hook:0, Iface:"vxlan.calico", LogLevel:"off", HostIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xdc, 0x1
, 0x7}, HostTunnelIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xde, 0x14, 0x80}, IntfIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xde, 0x14, 0x80}, FIB:true, ToHostDrop:false, DSR:true, DSROptoutCIDRs:false, TunnelMTU:0x5aa, VXLANPort:0x12b5, WgPort:0xca6c, ExtToServiceConnmark:0x0, PSNATStart:0x4
e20, PSNATEnd:0x752f, IPv6Enabled:false, RPFEnforceOption:0x2, NATin:0x0, NATout:0x0}
2023-10-13T09:07:04.093895885+03:00 stdout F I1013 06:07:04.093813 46 shared_informer.go:277] Caches are synced for service config
2023-10-13T09:07:04.094058515+03:00 stdout F I1013 06:07:04.093978 46 shared_informer.go:277] Caches are synced for endpoint slice config
2023-10-13T09:07:04.094462962+03:00 stdout F E1013 06:07:04.094420 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
2023-10-13T09:07:04.094717749+03:00 stdout F 2023-10-13 06:07:04.094 [INFO][46] felix/syncer.go 651: Loading BPF map state from dataplane
2023-10-13T09:07:04.094725379+03:00 stdout F 2023-10-13 06:07:04.094 [DEBUG][46] felix/syncer.go 970: resolved NATKey{Proto:6 Addr:10.222.116.105 Port:80 SrcAddr:0.0.0.0/0} as ingress-nginx/ingress-nginx-controller:http
2023-10-13T09:07:04.095034304+03:00 stdout F 2023-10-13 06:07:04.094 [DEBUG][46] felix/syncer.go 970: resolved NATKey{Proto:6 Addr:10.222.118.75 Port:443 SrcAddr:0.0.0.0/0} as victoria-monitoring/victoria-metrics-operator:webhook
2023-10-13T09:07:04.095042921+03:00 stdout F 2023-10-13 06:07:04.094 [DEBUG][46] felix/syncer.go 924: resolved NATKey{Proto:6 Addr:10.220.1.7 Port:32406 SrcAddr:0.0.0.0/0} as NodePort:10.220.1.7:ingress-nginx/ingress-nginx-controller:https
2023-10-13T09:07:04.095070978+03:00 stdout F 2023-10-13 06:07:04.094 [DEBUG][46] felix/syncer.go 924: resolved NATKey{Proto:6 Addr:10.220.1.7 Port:31224 SrcAddr:0.0.0.0/0} as NodePort:10.220.1.7:ingress-nginx/ingress-nginx-controller:http
--
2023-10-13T09:07:10.148434097+03:00 stdout F 2023-10-13 06:07:10.147 [DEBUG][46] felix/asm.go 656: Asm: 36: StoreReg32 dst=R9 src=R1 off=84 imm=0x00000000/0
2023-10-13T09:07:10.148436431+03:00 stdout F 2023-10-13 06:07:10.147 [DEBUG][46] felix/asm.go 656: Asm: 37: MovImm64 dst=R0 src=R0 off=0 imm=0x00000002/2
2023-10-13T09:07:10.148438705+03:00 stdout F 2023-10-13 06:07:10.147 [DEBUG][46] felix/asm.go 656: Asm: 38: Exit dst=R0 src=R0 off=0 imm=0x00000000/0
2023-10-13T09:07:10.148443574+03:00 stdout F 2023-10-13 06:07:10.147 [DEBUG][46] felix/bpf_syscall.go 42: LoadBPFProgramFromInsns([Mov64 dst=R6 src=R1 off=0 imm=0x00000000/0 MovImm64 dst=R1 src=R0 off=0 imm=0x00000000/0 StoreReg32 dst=R10 src=R1 off=-4 imm=0x00000000/0 Mov64 dst=R2 src=R10 off=0 imm=0x00000000/0 AddImm64 dst=R2 src=R0 off=0 imm=0xfffffffc/-4 Loa
dImm64 dst=R1 src=R1 off=0 imm=0x0000000c/12 LoadImm64Pt2 dst=R0 src=R0 off=0 imm=0x00000000/0 Call dst=R0 src=R0 off=0 imm=0x00000001/1 JumpEqImm64 dst=R0 src=R0 off=17 imm=0x00000000/0 Mov64 dst=R9 src=R0 off=0 imm=0x00000000/0 LoadReg64 dst=R1 src=R9 off=408 imm=0x00000000/0 AndImm64 dst=R1 src=R0 off=0 imm=0x0000000c/12 JumpNEImm64 dst=R1 src=R0 off=1 imm=0x
00000000/0 JumpA dst=R0 src=R0 off=0 imm=0x00000000/0 LoadReg8 dst=R1 src=R9 off=100 imm=0x00000000/0 JumpGEImm64 dst=R1 src=R0 off=12 imm=0x00000020/32 Mov64 dst=R2 src=R1 off=0 imm=0x00000000/0 AddImm64 dst=R2 src=R0 off=0 imm=0x00000001/1 StoreReg8 dst=R9 src=R2 off=100 imm=0x00000000/0 ShiftLImm64 dst=R1 src=R0 off=0 imm=0x00000003/3 AddImm64 dst=R1 src=R0 o
ff=0 imm=0x00000068/104 LoadImm64 dst=R2 src=R0 off=0 imm=0x70517a6e/1884387950 LoadImm64Pt2 dst=R0 src=R0 off=0 imm=0xc11d3624/-1055050204 Add64 dst=R1 src=R9 off=0 imm=0x00000000/0 StoreReg64 dst=R1 src=R2 off=0 imm=0x00000000/0 JumpA dst=R0 src=R0 off=2 imm=0x00000000/0 MovImm64 dst=R0 src=R0 off=0 imm=0x00000002/2 Exit dst=R0 src=R0 off=0 imm=0x00000000/0 Mo
vImm32 dst=R1 src=R0 off=0 imm=0x00000001/1 StoreReg32 dst=R9 src=R1 off=84 imm=0x00000000/0 Mov64 dst=R1 src=R6 off=0 imm=0x00000000/0 LoadImm64 dst=R2 src=R1 off=0 imm=0x00000050/80 LoadImm64Pt2 dst=R0 src=R0 off=0 imm=0x00000000/0 MovImm32 dst=R3 src=R0 off=0 imm=0x00000003/3 Call dst=R0 src=R0 off=0 imm=0x0000000c/12 MovImm32 dst=R1 src=R0 off=0 imm=0x000000
0a/10 StoreReg32 dst=R9 src=R1 off=84 imm=0x00000000/0 MovImm64 dst=R0 src=R0 off=0 imm=0x00000002/2 Exit dst=R0 src=R0 off=0 imm=0x00000000/0], Apache-2.0, 3)
2023-10-13T09:07:10.148446129+03:00 stdout F 2023-10-13 06:07:10.147 [DEBUG][46] felix/bpf_syscall.go 84: tryLoadBPFProgramFromInsns(..., Apache-2.0, 0, 3)
2023-10-13T09:07:10.148737839+03:00 stdout F E1013 06:07:10.148698 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
2023-10-13T09:07:10.151438732+03:00 stdout F 2023-10-13 06:07:10.148 [DEBUG][46] felix/attach.go 306: Found old calico program prog=tc.attachedProg{pref:"49152", handle:"0x1"}
2023-10-13T09:07:10.151449662+03:00 stdout F 2023-10-13 06:07:10.148 [DEBUG][46] felix/asm.go 656: Asm: start 0: Mov64 dst=R6 src=R1 off=0 imm=0x00000000/0
2023-10-13T09:07:10.151452708+03:00 stdout F 2023-10-13 06:07:10.148 [DEBUG][46] felix/asm.go 656: Asm: 1: MovImm64 dst=R1 src=R0 off=0 imm=0x00000000/0
2023-10-13T09:07:10.151455112+03:00 stdout F 2023-10-13 06:07:10.148 [DEBUG][46] felix/asm.go 656: Asm: 2: StoreReg32 dst=R10 src=R1 off=-4 imm=0x00000000/0
2023-10-13T09:07:10.151457427+03:00 stdout F 2023-10-13 06:07:10.148 [DEBUG][46] felix/asm.go 656: Asm: 3: Mov64 dst=R2 src=R10 off=0 imm=0x00000000/0
---
Does this happen after configuration changed?
Restarting workload in cluster triggering new error in logs
❯ k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-nginx-controller-8d9c994d4-mb8d8 1/1 Running 0 21m 10.222.20.142 ng02-m62nm-sjkt7 <none> <none>
❯ k delete pods ingress-nginx-controller-8d9c994d4-mb8d8
pod "ingress-nginx-controller-8d9c994d4-mb8d8" deleted
❯ k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-nginx-controller-8d9c994d4-vxh6j 1/1 Running 0 18s 10.222.6.78 ng02-m62nm-x9lmv <none> <none>
❯ k delete pods ingress-nginx-controller-8d9c994d4-vxh6j
pod "ingress-nginx-controller-8d9c994d4-vxh6j" deleted
❯ k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-nginx-controller-8d9c994d4-hf2z9 0/1 Running 0 16s 10.222.20.143 ng02-m62nm-sjkt7 <none> <none>
❯ k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-nginx-controller-8d9c994d4-hf2z9 1/1 Running 0 38s 10.222.20.143 ng02-m62nm-sjkt7 <none> <none>
Logs from calico-node pods on node ng02-m62nm-sjkt7
❯ k -n kube-system logs calico-node-4wbhf --tail=10 -f | grep "listen tcp :32074: bind: address already in use"
Defaulted container "calico-node" out of: calico-node, upgrade-ipam (init), install-cni (init), mount-bpffs (init)
E1013 06:34:30.719336 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:34:31.758363 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:34:32.792317 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:34:41.127877 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:34:42.147808 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:34:43.162701 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:02.486456 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:03.513046 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:04.530426 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:14.577946 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:15.600991 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:22.892817 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:23.927837 46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
And calico-node -felix
process listens to the port even when an error occurs
root@ng02-m62nm-sjkt7:~# ss -antpl | grep calico-node
LISTEN 0 30000 127.0.0.1:9099 0.0.0.0:* users:(("calico-node",pid=3334,fd=9))
LISTEN 0 30000 *:32074 *:* users:(("calico-node",pid=3334,fd=70))
LISTEN 0 30000 *:9091 *:* users:(("calico-node",pid=3334,fd=68))
root@ng02-m62nm-sjkt7:~# lsof -nP -iTCP:32074
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
calico-no 3334 root 70u IPv6 63277 0t0 TCP *:32074 (LISTEN)
root@ng02-m62nm-sjkt7:~# ps -ef | grep 3334
root 3334 3329 1 09:07 ? 00:00:41 calico-node -felix
# The vm ng02-m62nm-sjkt7 was booted in 09:07
But lsof said what calico listening IPv6 port Is it possible what felix trying listen port on ipv4 and ipv6?
if you restart the calico-node pod, this issue is resolved?
Yes. When we restart pod calico-node or just process calico-node -felix
from OS, HealtCheckPort gives correct response
But lsof said what calico listening IPv6 port Is it possible what felix trying listen port on ipv4 and ipv6?
Usually when you say you want to listen on any ip, then it includes ipv6 (or may appear as listening to ipv6 only) if you have ipv6 enabled. That is not an issue. Also note that this code is imported from k8s and is the same as the code in kube-proxy.
Let me review the logs and get back to you, I think I know what is happening :crossed_fingers:
@tomastigera Hello, any updates for this issue?
felix/kube-proxy.go 214: kube-proxy OnHostIPsUpdate: [10.220.1.7 10.222.20.128]
...
felix/kube-proxy.go 172: kube-proxy stopped to restart with updated host IPs
...
felix/kube-proxy.go 137: kube-proxy started, hostname="ng02-m62nm-sjkt7" hostIPs=[10.220.1.7 10.222.20.128]
...
service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
I think the previous incarnation of kube-proxy still holds the port, so we need to restart it more gracefully
@tomastigera yes, every OnHostIPsUpdate producing this situation. On first start of vm we see it because of creating vxlan interface after start kube-proxy.
What do you mean under "more gracefully"? I searched where felix releases ports, but on stop only generating new proxy and new dpsyncer. If show me where, i cant try to fix, test and prepare PR.
i tried to addd a lock, but it doesn't help (as i expected)
The issue is that the kube-proxy frontend that we use in our kubeproxy does not expect to be shut down in any other way that hard stop of the process, while we "restart" the kubeproxy when the host ip changes as it was an easy way to reconcile the NAT tables. However, the webservers that handle the health checks don't shut down. So we need to be more careful about how we handle that without control of the k8s part of the code.
I thought the same. fast fix: “suicide” - just make graceful shutdown and wait till k8s restart calico-node But I think it will cause loss of traffic.
right fix: fork k8s lib and add to interface method that will close all listeners.
May I open a PR? And what way of fix do you prefer?
There are at least 3 other ways I can see:
we can't really do the suicide and we do not really want to fork and maintain k8s lib either ;-)
I think my favourite option is something between 1 and 2, but a quick fix for you might be 3. All you need to do is in proxy.Stop() here a call to p.svcHealthServer.SyncServices()
and p.svcHealthServer.SyncEndpoints()
with empty maps, which should close the servers as pet the comment here https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/healthcheck/service_health.go#L42-L52 Give it a shot! :crossed_fingers:
@lelvisl @xander-sh could you try this 3.26 based image with a fix? thruby/calico-node:health-3-26
@lelvisl @xander-sh could you try this 3.26 based image with a fix?
thruby/calico-node:health-3-26
We update calico-node daemonset images to
image: thruby/calico-node:health-3-26
image: registry.local/calico/cni:v3.26.3
image: registry.local/calico/cni:v3.26.3
image: thruby/calico-node:health-3-26
All test are passed successfully
ExternalTrafficPolicy:Local
And no error in logs and HealtCheckPort also works properly!
We have vanilla Kubernetes cluster v1.22.15 with Calico on eBPF dataplane v3.25.0 on virtual machine (Ubuntu 20.04). And for load balancing service we use external loadBalancer which sends traffic to NodePort watching for the response with active local endpoints from HealthCheckNodePort.
https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip
Nginx controller or other workload is deployed in k8s cluster and its service use externalTrafficPolicy: Local to see real ip in incoming packets.
Expected Behavior
Correct response by HealthCheckNodePort if there is an workload localendpoint on the node
Current Behavior
calico-node has error in logs after rebooting vm or creating vm
And HealtCheckNodePort give wrong response on this node
But calico-node process in OS listens this port
Possible Solution
Restart pod calico-node on this node or restart process
calico-node -felix
from OSSteps to Reproduce (for bugs)
1.Vanilla Kubernetes Cluster with calico cni with eBPF dataplane 2.Create Kubernetes service type LoadBalancer with externalTrafficPolicy: Local 3.reboot the node where endpoints of the service are located 4.see logs in calico-node and curl HealtCheckPort on this node
Context
As healtCheckNodePort does not respond as expected after rebooting or creating node we experience lose of traffic to workload. We have checked code and found where it is most likely happenning https://github.com/kubernetes/kubernetes/blob/57d3cc66050545d7fefb0841e750a0a41df14144/pkg/proxy/healthcheck/service_health.go#L145 https://github.com/projectcalico/calico/blob/b5339f1febc84725efbeae6d17d93e2940ef4a94/felix/bpf/proxy/proxy.go#L242C12-L242C42
Your Environment
Felix configuration: