HealtCheck NodePort servers leak when kube-proxy gets restarted due to host IP update

xander-sh commented 1 year ago

We have vanilla Kubernetes cluster v1.22.15 with Calico on eBPF dataplane v3.25.0 on virtual machine (Ubuntu 20.04). And for load balancing service we use external loadBalancer which sends traffic to NodePort watching for the response with active local endpoints from HealthCheckNodePort.
https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip

Nginx controller or other workload is deployed in k8s cluster and its service use externalTrafficPolicy: Local to see real ip in incoming packets.

Expected Behavior

Correct response by HealthCheckNodePort if there is an workload localendpoint on the node

Current Behavior

calico-node has error in logs after rebooting vm or creating vm

service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng01-7df5975f6dxhxdxx-2gkhr" service="ingress-nginx/ingress-nginx-controller" port=32074

And HealtCheckNodePort give wrong response on this node

root@ng01-7df5975f6dxhxdxx-2gkhr:~# crictl ps |grep ingress
9d7aefa0e13d0       a4abbd85b9ebb       About a minute ago   Running             controller                  0                   bdc49ecb88518       ingress-nginx-controller-8d9c994d4-nszhp
root@ng01-7df5975f6dxhxdxx-2gkhr:~# curl http://10.220.1.5:32074
{
        "service": {
                "namespace": "ingress-nginx",
                "name": "ingress-nginx-controller"
        },
        "localEndpoints": 0

But calico-node process in OS listens this port

root@ng01-7df5975f6dxhxdxx-2gkhr:~# ss -antpl | grep 32074
LISTEN    0         30000                    *:32074                  *:*        users:(("calico-node",pid=3130,fd=70))

root@ng01-7df5975f6dxhxdxx-2gkhr:~# crictl ps | grep calico
c0d741d7b92e8       8065b798a4d67       16 minutes ago      Running             calico-node                 4                   305a93a0aabaf       calico-node-xnc2z
root@ng01-7df5975f6dxhxdxx-2gkhr:~# crictl inspect c0d741d7b92e8 | grep -i pid
    "pid": 3076,
            "pid": 1
            "type": "pid"
root@ng01-7df5975f6dxhxdxx-2gkhr:~#
# "calico-node -felix" container process with pid 3130 belong to pod calico-node with pid 3076
root      2561  0.2  0.2 114108  9332 ?        Sl   15:30   0:01 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 305a93a0aabafa4274c43d0b890d1ab0e025e06d88f2e7b4900e8c7268f63278 -address /run/containerd/containerd.sock
65535     2625  0.0  0.0    972     4 ?        Ss   15:30   0:00  \_ /pause
root      3076  0.0  0.0   4404   888 ?        Ss   15:30   0:00  \_ /usr/local/bin/runsvdir -P /etc/service/enabled
root      3125  0.0  0.0   4252   820 ?        Ss   15:30   0:00      \_ runsv felix
root      3130  1.7  1.6 1830784 65072 ?       Sl   15:30   0:16      |   \_ calico-node -felix
root      3126  0.0  0.0   4252   832 ?        Ss   15:30   0:00      \_ runsv monitor-addresses
root      3132  0.0  1.1 1387624 47712 ?       Sl   15:30   0:00      |   \_ calico-node -monitor-addresses
root      3127  0.0  0.0   4252   900 ?        Ss   15:30   0:00      \_ runsv allocate-tunnel-addrs
root      3133  0.0  1.3 1240160 52716 ?       Sl   15:30   0:00      |   \_ calico-node -allocate-tunnel-addrs
root      3128  0.0  0.0   4252   896 ?        Ss   15:30   0:00      \_ runsv node-status-reporter
root      3131  0.0  1.1 1240160 47540 ?       Sl   15:30   0:00      |   \_ calico-node -status-reporter
root      3129  0.0  0.0   4252   904 ?        Ss   15:30   0:00      \_ runsv cni
root      3134  0.0  1.2 1313892 48124 ?       Sl   15:30   0:00          \_ calico-node -monitor-token

Possible Solution

Restart pod calico-node on this node or restart process calico-node -felix from OS

# restart calico-node process
root@ng01-7df5975f6dxhxdxx-2gkhr:~# kill 3130
root@ng01-7df5975f6dxhxdxx-2gkhr:~# lsof -nP -iTCP:32074
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
calico-no 13130 root   70u  IPv6 126998      0t0  TCP *:32074 (LISTEN)
root@ng01-7df5975f6dxhxdxx-2gkhr:~# ss -antpl | grep 32074
LISTEN    0         30000                    *:32074                  *:*        users:(("calico-node",pid=13130,fd=70))

root      2561  0.2  0.2 114108  9332 ?        Sl   15:30   0:02 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 305a93a0aabafa4274c43d0b890d1ab0e025e06d88f2e7b4900e8c7268f63278 -address /run/containerd/containerd.sock
65535     2625  0.0  0.0    972     4 ?        Ss   15:30   0:00  \_ /pause
root      3076  0.0  0.0   4404   888 ?        Ss   15:30   0:00  \_ /usr/local/bin/runsvdir -P /etc/service/enabled
root      3125  0.0  0.0   4252   820 ?        Ss   15:30   0:00      \_ runsv felix
root     13130  1.9  1.5 1609844 62512 ?       Sl   15:48   0:03      |   \_ calico-node -felix
root      3126  0.0  0.0   4252   832 ?        Ss   15:30   0:00      \_ runsv monitor-addresses
root      3132  0.0  1.1 1387624 47712 ?       Sl   15:30   0:00      |   \_ calico-node -monitor-addresses
root      3127  0.0  0.0   4252   900 ?        Ss   15:30   0:00      \_ runsv allocate-tunnel-addrs
root      3133  0.0  1.3 1240160 52716 ?       Sl   15:30   0:00      |   \_ calico-node -allocate-tunnel-addrs
root      3128  0.0  0.0   4252   896 ?        Ss   15:30   0:00      \_ runsv node-status-reporter
root      3131  0.0  1.1 1240160 47540 ?       Sl   15:30   0:00      |   \_ calico-node -status-reporter
root      3129  0.0  0.0   4252   904 ?        Ss   15:30   0:00      \_ runsv cni
root      3134  0.0  1.2 1313892 48124 ?       Sl   15:30   0:00          \_ calico-node -monitor-token

# After restarting calico-node on this node, HealtCheckPort gives correct response 
root@ng01-7df5975f6dxhxdxx-2gkhr:~# curl http://10.220.1.5:32074
{
        "service": {
                "namespace": "ingress-nginx",
                "name": "ingress-nginx-controller"
        },
        "localEndpoints": 1

Steps to Reproduce (for bugs)

1.Vanilla Kubernetes Cluster with calico cni with eBPF dataplane 2.Create Kubernetes service type LoadBalancer with externalTrafficPolicy: Local 3.reboot the node where endpoints of the service are located 4.see logs in calico-node and curl HealtCheckPort on this node

Context

As healtCheckNodePort does not respond as expected after rebooting or creating node we experience lose of traffic to workload. We have checked code and found where it is most likely happenning https://github.com/kubernetes/kubernetes/blob/57d3cc66050545d7fefb0841e750a0a41df14144/pkg/proxy/healthcheck/service_health.go#L145 https://github.com/projectcalico/calico/blob/b5339f1febc84725efbeae6d17d93e2940ef4a94/felix/bpf/proxy/proxy.go#L242C12-L242C42

Your Environment

Calico version v3.25.0 with eBPF dataplane and VXLAN enabled
Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes v1.22.15
Operating System and version: Ubuntu 20.04, kernel 5.15.0-67-generic, containerd://1.5.5,

Felix configuration:

apiVersion: crd.projectcalico.org/v1
kind: FelixConfiguration
metadata:
  name: default
spec:
  bpfEnabled: true
  bpfExternalServiceMode: DSR
  bpfLogLevel: ""
  logSeverityScreen: Debug
  prometheusMetricsEnabled: true
  reportingInterval: 0s
  vxlanEnabled: true
  vxlanPort: 4789
  vxlanVNI: 4096

tomastigera commented 1 year ago

But calico-node process in OS listens this port

That is correctl, calico-node serves as the kube-proxy.

service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng01-7df5975f6dxhxdxx-2gkhr" service="ingress-nginx/ingress-nginx-controller" port=32074

Do you have some more logs around this line? Does this happen after configuration changed? Do I understand correctly that if you restart the calico-node pod, this issue is resolved? Do you get a correct response from health check then?

There seems to be an issue that we do not release the port when we restart our kube-proxy (fast enough) and after restarting the proxy (which happens when config changes or when a node gets external IP etc.) we may not listen to that port again and you get response from the old process (hypothesis)

service use externalTrafficPolicy: Local to see real ip in incoming packets.

Note that using nodeports with calico in ebpf mode you would always see the clients IP (unlike when using regular k8s). So unless you want to prevent forwarding to another node due to a perf reason, you do not need to use Local

xander-sh commented 1 year ago

Do you have some more logs around this line? Does this happen after configuration changed? Do I understand correctly that if you restart the calico-node pod, this issue is resolved? Do you get a correct response from health check then?

I upload logs from calico-node pod after restart vm. And this log contain information about the workload restart around 06:12:39 UTC or 09:12:39 local time

calico-node-4wbhf.log.zip

And paste logs around error on start up

> grep "listen tcp :32074: bind: address already in use" -A5 -B5 ~/tmp/calico_issue_8112/3.log.20231013-091314

2023-10-13T09:07:04.01579622+03:00 stdout F 2023-10-13 06:07:04.002 [DEBUG][46] felix/attach.go 78: Couldn't get the attached TC program ID. attachPoint=&tc.AttachPoint{Type:"host", ToOrFrom:"from", Hook:0, Iface:"vxlan.calico", LogLevel:"off", HostIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xdc, 0x1, 0x7}, HostTunnelIP:net.IP{0
x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xde, 0x14, 0x80}, IntfIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xde, 0x14, 0x80}, FIB:true, ToHostDrop:false, DSR:true, DSROptoutCIDRs:false, TunnelMTU:0x5aa, VXLANPort:0x12b5, WgPort:0xca6c, ExtToServiceConnmark:0x0, PSNATStart:0x4e20, PSNATEnd:0x752f, IPv6Ena
bled:false, RPFEnforceOption:0x2, NATin:0x0, NATout:0x0} error=Couldn't find 'id <ID> in 'tc filter' command out=
2023-10-13T09:07:04.015798999+03:00 stdout F  err=no TC program attached
2023-10-13T09:07:04.015802667+03:00 stdout F 2023-10-13 06:07:04.002 [DEBUG][46] felix/attach.go 157: Continue with attaching BPF program from_hep_fib_dsr_no_log_co-re.o attachPoint=&tc.AttachPoint{Type:"host", ToOrFrom:"from", Hook:0, Iface:"vxlan.calico", LogLevel:"off", HostIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xdc, 0x1
, 0x7}, HostTunnelIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xde, 0x14, 0x80}, IntfIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xde, 0x14, 0x80}, FIB:true, ToHostDrop:false, DSR:true, DSROptoutCIDRs:false, TunnelMTU:0x5aa, VXLANPort:0x12b5, WgPort:0xca6c, ExtToServiceConnmark:0x0, PSNATStart:0x4
e20, PSNATEnd:0x752f, IPv6Enabled:false, RPFEnforceOption:0x2, NATin:0x0, NATout:0x0}
2023-10-13T09:07:04.093895885+03:00 stdout F I1013 06:07:04.093813      46 shared_informer.go:277] Caches are synced for service config
2023-10-13T09:07:04.094058515+03:00 stdout F I1013 06:07:04.093978      46 shared_informer.go:277] Caches are synced for endpoint slice config
2023-10-13T09:07:04.094462962+03:00 stdout F E1013 06:07:04.094420      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
2023-10-13T09:07:04.094717749+03:00 stdout F 2023-10-13 06:07:04.094 [INFO][46] felix/syncer.go 651: Loading BPF map state from dataplane
2023-10-13T09:07:04.094725379+03:00 stdout F 2023-10-13 06:07:04.094 [DEBUG][46] felix/syncer.go 970: resolved NATKey{Proto:6 Addr:10.222.116.105 Port:80 SrcAddr:0.0.0.0/0} as ingress-nginx/ingress-nginx-controller:http
2023-10-13T09:07:04.095034304+03:00 stdout F 2023-10-13 06:07:04.094 [DEBUG][46] felix/syncer.go 970: resolved NATKey{Proto:6 Addr:10.222.118.75 Port:443 SrcAddr:0.0.0.0/0} as victoria-monitoring/victoria-metrics-operator:webhook
2023-10-13T09:07:04.095042921+03:00 stdout F 2023-10-13 06:07:04.094 [DEBUG][46] felix/syncer.go 924: resolved NATKey{Proto:6 Addr:10.220.1.7 Port:32406 SrcAddr:0.0.0.0/0} as NodePort:10.220.1.7:ingress-nginx/ingress-nginx-controller:https
2023-10-13T09:07:04.095070978+03:00 stdout F 2023-10-13 06:07:04.094 [DEBUG][46] felix/syncer.go 924: resolved NATKey{Proto:6 Addr:10.220.1.7 Port:31224 SrcAddr:0.0.0.0/0} as NodePort:10.220.1.7:ingress-nginx/ingress-nginx-controller:http
--
2023-10-13T09:07:10.148434097+03:00 stdout F 2023-10-13 06:07:10.147 [DEBUG][46] felix/asm.go 656: Asm:  36:    StoreReg32 dst=R9 src=R1 off=84 imm=0x00000000/0
2023-10-13T09:07:10.148436431+03:00 stdout F 2023-10-13 06:07:10.147 [DEBUG][46] felix/asm.go 656: Asm:  37:    MovImm64 dst=R0 src=R0 off=0 imm=0x00000002/2
2023-10-13T09:07:10.148438705+03:00 stdout F 2023-10-13 06:07:10.147 [DEBUG][46] felix/asm.go 656: Asm:  38:    Exit dst=R0 src=R0 off=0 imm=0x00000000/0
2023-10-13T09:07:10.148443574+03:00 stdout F 2023-10-13 06:07:10.147 [DEBUG][46] felix/bpf_syscall.go 42: LoadBPFProgramFromInsns([Mov64 dst=R6 src=R1 off=0 imm=0x00000000/0 MovImm64 dst=R1 src=R0 off=0 imm=0x00000000/0 StoreReg32 dst=R10 src=R1 off=-4 imm=0x00000000/0 Mov64 dst=R2 src=R10 off=0 imm=0x00000000/0 AddImm64 dst=R2 src=R0 off=0 imm=0xfffffffc/-4 Loa
dImm64 dst=R1 src=R1 off=0 imm=0x0000000c/12 LoadImm64Pt2 dst=R0 src=R0 off=0 imm=0x00000000/0 Call dst=R0 src=R0 off=0 imm=0x00000001/1 JumpEqImm64 dst=R0 src=R0 off=17 imm=0x00000000/0 Mov64 dst=R9 src=R0 off=0 imm=0x00000000/0 LoadReg64 dst=R1 src=R9 off=408 imm=0x00000000/0 AndImm64 dst=R1 src=R0 off=0 imm=0x0000000c/12 JumpNEImm64 dst=R1 src=R0 off=1 imm=0x
00000000/0 JumpA dst=R0 src=R0 off=0 imm=0x00000000/0 LoadReg8 dst=R1 src=R9 off=100 imm=0x00000000/0 JumpGEImm64 dst=R1 src=R0 off=12 imm=0x00000020/32 Mov64 dst=R2 src=R1 off=0 imm=0x00000000/0 AddImm64 dst=R2 src=R0 off=0 imm=0x00000001/1 StoreReg8 dst=R9 src=R2 off=100 imm=0x00000000/0 ShiftLImm64 dst=R1 src=R0 off=0 imm=0x00000003/3 AddImm64 dst=R1 src=R0 o
ff=0 imm=0x00000068/104 LoadImm64 dst=R2 src=R0 off=0 imm=0x70517a6e/1884387950 LoadImm64Pt2 dst=R0 src=R0 off=0 imm=0xc11d3624/-1055050204 Add64 dst=R1 src=R9 off=0 imm=0x00000000/0 StoreReg64 dst=R1 src=R2 off=0 imm=0x00000000/0 JumpA dst=R0 src=R0 off=2 imm=0x00000000/0 MovImm64 dst=R0 src=R0 off=0 imm=0x00000002/2 Exit dst=R0 src=R0 off=0 imm=0x00000000/0 Mo
vImm32 dst=R1 src=R0 off=0 imm=0x00000001/1 StoreReg32 dst=R9 src=R1 off=84 imm=0x00000000/0 Mov64 dst=R1 src=R6 off=0 imm=0x00000000/0 LoadImm64 dst=R2 src=R1 off=0 imm=0x00000050/80 LoadImm64Pt2 dst=R0 src=R0 off=0 imm=0x00000000/0 MovImm32 dst=R3 src=R0 off=0 imm=0x00000003/3 Call dst=R0 src=R0 off=0 imm=0x0000000c/12 MovImm32 dst=R1 src=R0 off=0 imm=0x000000
0a/10 StoreReg32 dst=R9 src=R1 off=84 imm=0x00000000/0 MovImm64 dst=R0 src=R0 off=0 imm=0x00000002/2 Exit dst=R0 src=R0 off=0 imm=0x00000000/0], Apache-2.0, 3)
2023-10-13T09:07:10.148446129+03:00 stdout F 2023-10-13 06:07:10.147 [DEBUG][46] felix/bpf_syscall.go 84: tryLoadBPFProgramFromInsns(..., Apache-2.0, 0, 3)
2023-10-13T09:07:10.148737839+03:00 stdout F E1013 06:07:10.148698      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
2023-10-13T09:07:10.151438732+03:00 stdout F 2023-10-13 06:07:10.148 [DEBUG][46] felix/attach.go 306: Found old calico program prog=tc.attachedProg{pref:"49152", handle:"0x1"}
2023-10-13T09:07:10.151449662+03:00 stdout F 2023-10-13 06:07:10.148 [DEBUG][46] felix/asm.go 656: Asm: start 0:    Mov64 dst=R6 src=R1 off=0 imm=0x00000000/0
2023-10-13T09:07:10.151452708+03:00 stdout F 2023-10-13 06:07:10.148 [DEBUG][46] felix/asm.go 656: Asm:  1:    MovImm64 dst=R1 src=R0 off=0 imm=0x00000000/0
2023-10-13T09:07:10.151455112+03:00 stdout F 2023-10-13 06:07:10.148 [DEBUG][46] felix/asm.go 656: Asm:  2:    StoreReg32 dst=R10 src=R1 off=-4 imm=0x00000000/0
2023-10-13T09:07:10.151457427+03:00 stdout F 2023-10-13 06:07:10.148 [DEBUG][46] felix/asm.go 656: Asm:  3:    Mov64 dst=R2 src=R10 off=0 imm=0x00000000/0
---

Does this happen after configuration changed?

Restarting workload in cluster triggering new error in logs


❯ k get pods -o wide
NAME                                       READY   STATUS    RESTARTS   AGE   IP              NODE               NOMINATED NODE   READINESS GATES
ingress-nginx-controller-8d9c994d4-mb8d8   1/1     Running   0          21m   10.222.20.142   ng02-m62nm-sjkt7   <none>           <none>
❯ k delete pods ingress-nginx-controller-8d9c994d4-mb8d8
pod "ingress-nginx-controller-8d9c994d4-mb8d8" deleted
❯ k get pods -o wide
NAME                                       READY   STATUS    RESTARTS   AGE   IP            NODE               NOMINATED NODE   READINESS GATES
ingress-nginx-controller-8d9c994d4-vxh6j   1/1     Running   0          18s   10.222.6.78   ng02-m62nm-x9lmv   <none>           <none>
❯ k delete pods ingress-nginx-controller-8d9c994d4-vxh6j
pod "ingress-nginx-controller-8d9c994d4-vxh6j" deleted
❯ k get pods -o wide
NAME                                       READY   STATUS    RESTARTS   AGE   IP              NODE               NOMINATED NODE   READINESS GATES
ingress-nginx-controller-8d9c994d4-hf2z9   0/1     Running   0          16s   10.222.20.143   ng02-m62nm-sjkt7   <none>           <none>
❯ k get pods -o wide
NAME                                       READY   STATUS    RESTARTS   AGE   IP              NODE               NOMINATED NODE   READINESS GATES
ingress-nginx-controller-8d9c994d4-hf2z9   1/1     Running   0          38s   10.222.20.143   ng02-m62nm-sjkt7   <none>           <none>

Logs from calico-node pods on node ng02-m62nm-sjkt7

❯ k -n kube-system logs calico-node-4wbhf --tail=10 -f | grep "listen tcp :32074: bind: address already in use"
Defaulted container "calico-node" out of: calico-node, upgrade-ipam (init), install-cni (init), mount-bpffs (init)
E1013 06:34:30.719336      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:34:31.758363      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:34:32.792317      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:34:41.127877      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:34:42.147808      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:34:43.162701      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:02.486456      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:03.513046      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:04.530426      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:14.577946      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:15.600991      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:22.892817      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074
E1013 06:35:23.927837      46 service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074

And calico-node -felix process listens to the port even when an error occurs

root@ng02-m62nm-sjkt7:~# ss -antpl | grep calico-node
LISTEN    0         30000            127.0.0.1:9099             0.0.0.0:*        users:(("calico-node",pid=3334,fd=9))
LISTEN    0         30000                    *:32074                  *:*        users:(("calico-node",pid=3334,fd=70))
LISTEN    0         30000                    *:9091                   *:*        users:(("calico-node",pid=3334,fd=68))
root@ng02-m62nm-sjkt7:~# lsof -nP -iTCP:32074
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
calico-no 3334 root   70u  IPv6  63277      0t0  TCP *:32074 (LISTEN)
root@ng02-m62nm-sjkt7:~# ps -ef | grep 3334
root      3334  3329  1 09:07 ?        00:00:41 calico-node -felix
# The vm ng02-m62nm-sjkt7  was booted in 09:07

But lsof said what calico listening IPv6 port Is it possible what felix trying listen port on ipv4 and ipv6?

if you restart the calico-node pod, this issue is resolved?

Yes. When we restart pod calico-node or just process calico-node -felix from OS, HealtCheckPort gives correct response

tomastigera commented 1 year ago

But lsof said what calico listening IPv6 port Is it possible what felix trying listen port on ipv4 and ipv6?

Usually when you say you want to listen on any ip, then it includes ipv6 (or may appear as listening to ipv6 only) if you have ipv6 enabled. That is not an issue. Also note that this code is imported from k8s and is the same as the code in kube-proxy.

Let me review the logs and get back to you, I think I know what is happening :crossed_fingers:

xander-sh commented 11 months ago

@tomastigera Hello, any updates for this issue?

tomastigera commented 10 months ago

felix/kube-proxy.go 214: kube-proxy OnHostIPsUpdate: [10.220.1.7 10.222.20.128]
...
felix/kube-proxy.go 172: kube-proxy stopped to restart with updated host IPs
...
felix/kube-proxy.go 137: kube-proxy started, hostname="ng02-m62nm-sjkt7" hostIPs=[10.220.1.7 10.222.20.128]
...
service_health.go:141] "Failed to start healthcheck" err="listen tcp :32074: bind: address already in use" node="ng02-m62nm-sjkt7" service="ingress-nginx/ingress-nginx-controller" port=32074

I think the previous incarnation of kube-proxy still holds the port, so we need to restart it more gracefully

lelvisl commented 10 months ago

@tomastigera yes, every OnHostIPsUpdate producing this situation. On first start of vm we see it because of creating vxlan interface after start kube-proxy.

What do you mean under "more gracefully"? I searched where felix releases ports, but on stop only generating new proxy and new dpsyncer. If show me where, i cant try to fix, test and prepare PR.

lelvisl commented 10 months ago

i tried to addd a lock, but it doesn't help (as i expected)

tomastigera commented 10 months ago

The issue is that the kube-proxy frontend that we use in our kubeproxy does not expect to be shut down in any other way that hard stop of the process, while we "restart" the kubeproxy when the host ip changes as it was an easy way to reconcile the NAT tables. However, the webservers that handle the health checks don't shut down. So we need to be more careful about how we handle that without control of the k8s part of the code.

lelvisl commented 10 months ago

I thought the same. fast fix: “suicide” - just make graceful shutdown and wait till k8s restart calico-node But I think it will cause loss of traffic.

right fix: fork k8s lib and add to interface method that will close all listeners.

May I open a PR? And what way of fix do you prefer?

tomastigera commented 10 months ago

There are at least 3 other ways I can see:

don't restart :)
survive the frontend - it does not need the host IPs
there is a way how to close those servers but it may create some hickups

we can't really do the suicide and we do not really want to fork and maintain k8s lib either ;-)

I think my favourite option is something between 1 and 2, but a quick fix for you might be 3. All you need to do is in proxy.Stop() here a call to p.svcHealthServer.SyncServices() and p.svcHealthServer.SyncEndpoints() with empty maps, which should close the servers as pet the comment here https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/healthcheck/service_health.go#L42-L52 Give it a shot! :crossed_fingers:

tomastigera commented 10 months ago

@lelvisl @xander-sh could you try this 3.26 based image with a fix? thruby/calico-node:health-3-26

xander-sh commented 10 months ago

@lelvisl @xander-sh could you try this 3.26 based image with a fix? thruby/calico-node:health-3-26

We update calico-node daemonset images to

image: thruby/calico-node:health-3-26
image: registry.local/calico/cni:v3.26.3
image: registry.local/calico/cni:v3.26.3
image: thruby/calico-node:health-3-26

All test are passed successfully

[x] restart node
[x] recreate vxlan.calico interface
[x] create new LoadBalancer Service with ExternalTrafficPolicy:Local

And no error in logs and HealtCheckPort also works properly!

projectcalico / calico