projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.04k stars 1.35k forks source link

Can not curl some Pod endpoints from some nodes after performing Node drain on unrelated nodes. #9497

Open avin3sh opened 3 days ago

avin3sh commented 3 days ago

Expected Behavior

All the endpoints should be reachable from all the nodes

Current Behavior

  1. Drain multiple nodes in a short time
  2. Some endpoints, not necessarily the endpoints belonging to the pods which were on the drained nodes, suddenly become unreachable from some of the worker nodes - even if those workers were not drained / left untouched

Possible Solution

So far restarting either of the nodes is the only workaround that I am aware of. Or explicitly deleting the pod, so that a new endpoint gets created.

Steps to Reproduce (for bugs)

This is difficult to reproduce as I couldn't find a deterministic pattern. I have seen this on a cluster with 1000+ endpoints

  1. Make a list of all the endpoints in the cluster
  2. Have at least 2-3 nodes with ~20 endpoints
  3. Drain these nodes around the same time
  4. Curl all the endpoints in (1) except those in (2) from ALL the nodes
  5. You will notice that curl fails with Failed to connect to <endpoint> port 80 after X ms: Couldn't connect to server on some of the nodes for some of the endpoints
  6. You will also notice that on the nodes curl pails, other endpoints belonging to the Pods running on the same node are reachable. It's only some endpoint(s) belonging to that node not reachable.

Context

Take the following example, where myapp-stg-c8ccd55b6-2jld6 (192.168.230.138) Pod on mynodezonea1 is not reachable from mylinuxnodezoneb9.

k get pods -A --cluster=test -o wide | grep myapp-stg
app                                     myapp-stg-c8ccd55b6-2jld6                               1/1     Running                      1 (16d ago)        30d     192.168.230.138   mynodezonea1             <none>           <none>
app                                     myapp-stg-c8ccd55b6-g554x                               1/1     Running                      1 (16d ago)        30d     192.168.232.120   mynodezonea2             <none>           <none>
app                                     myapp-stg-c8ccd55b6-jn5tg                               1/1     Running                      1 (16d ago)        30d     192.168.237.93    mynodezoneb1             <none>           <none>
avinesh@mylinuxnodezoneb9:/$ curl 192.168.230.138
curl: (28) Failed to connect to 192.168.230.138 port 80 after 132207 ms: Couldn't connect to server

kubectl describe output of mynodezonea1:

PS C:\> k describe node mynodezonea1 --cluster=test
Name:               mynodezonea1
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=windows
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=mynodezonea1
                    kubernetes.io/os=windows
                    node-role.kubernetes.io/worker=worker
                    node.kubernetes.io/windows-build=10.0.20348
                    topology.kubernetes.io/region=myregion
                    topology.kubernetes.io/zone=zonea
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.228.88.200/22
                    projectcalico.org/IPv4VXLANTunnelAddr: 192.168.230.129
                    projectcalico.org/VXLANTunnelMACAddr: <trimmed>
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  <trimmed>
Taints:             os=windows:NoSchedule
Unschedulable:      false
Lease:              Failed to get lease: leases.coordination.k8s.io "mynodezonea1" is forbidden: User "username" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease"
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Sat, 19 Oct 2024 02:17:50 -0400   Sat, 19 Oct 2024 02:17:50 -0400   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Mon, 18 Nov 2024 14:01:24 -0500   Sat, 19 Oct 2024 02:17:49 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 18 Nov 2024 14:01:24 -0500   Sat, 19 Oct 2024 02:17:49 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 18 Nov 2024 14:01:24 -0500   Sat, 19 Oct 2024 02:17:49 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 18 Nov 2024 14:01:24 -0500   Sat, 19 Oct 2024 02:17:49 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.228.88.200
  Hostname:    mynodezonea1
Capacity:
<trimmed>
Allocatable:
<trimmed>
System Info:
  Machine ID:                     mynodezonea1
  System UUID:                    <trimmed>
  Boot ID:                        <trimmed>
  Kernel Version:                 <trimmed>
  OS Image:                       Windows Server 2022 Standard
  Operating System:               windows
  Architecture:                   amd64
  Container Runtime Version:      containerd://1.6.26
  Kubelet Version:                v1.27.12
  Kube-Proxy Version:             v1.27.12
PodCIDR:                          192.168.236.0/24
PodCIDRs:                         192.168.236.0/24
Non-terminated Pods:              (18 in total)
  Namespace                       Name                                           CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                       ----                                           ------------  ----------  ---------------  -------------  ---
  <trimmed>
  app                          myapp-prod-6c94f965f8-6pvhg               250m (1%)     1 (7%)      512M (0%)        1G (1%)        30d
  app                          myapp-stg-c8ccd55b6-2jld6                 250m (1%)     1 (7%)      512M (0%)        1G (1%)        30d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
<trimmed>

As you can see there is myapp-prod-6c94f965f8-6pvhg and other endpoints that I trimmed, on this node which are reachable from mylinuxnodezoneb9. Only the endpoint belonging to myapp-stg-c8ccd55b6-2jld6 is not.

Here is kubectl describe node for mylinuxnodezoneb9:

k describe node mylinuxnodezoneb9 --cluster=test
Name:               mylinuxnodezoneb9
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=mylinuxnodezoneb9
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=worker
                    topology.kubernetes.io/region=myregion
                    topology.kubernetes.io/zone=zoneb
Annotations:        
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.219.61.43/24
                    projectcalico.org/IPv4VXLANTunnelAddr: 192.168.235.64
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  <trimmed>
Taints:             <none>
Unschedulable:      false
Lease:              Failed to get lease: leases.coordination.k8s.io "mylinuxnodezoneb9" is forbidden: User "username"  cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease"
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Sun, 03 Nov 2024 21:45:53 -0500   Sun, 03 Nov 2024 21:45:53 -0500   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Mon, 18 Nov 2024 14:11:26 -0500   Sun, 03 Nov 2024 21:45:50 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 18 Nov 2024 14:11:26 -0500   Sun, 03 Nov 2024 21:45:50 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 18 Nov 2024 14:11:26 -0500   Sun, 03 Nov 2024 21:45:50 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 18 Nov 2024 14:11:26 -0500   Sun, 03 Nov 2024 21:45:50 -0500   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.219.61.43
  Hostname:    mylinuxnodezoneb9
Capacity:
<trimmed>
Allocatable:
<trimmed>
System Info:
  Machine ID:                         <trimmed>
  System UUID:                        <trimmed>
  Boot ID:                            <trimmed>
  Kernel Version:                     <trimmed>
  OS Image:                           Red Hat Enterprise Linux 8.10 (Ootpa)
  Operating System:                   linux
  Architecture:                       amd64
  Container Runtime Version:          cri-o://1.27.4
  Kubelet Version:                    v1.27.12
  Kube-Proxy Version:                 v1.27.12
PodCIDR:                              192.168.224.0/24
PodCIDRs:                             192.168.224.0/24
Non-terminated Pods:                  (37 in total)
  Namespace                           Name                                                CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                           ----                                                ------------  ----------  ---------------  -------------  ---
<trimmed>
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests           Limits
  --------           --------           ------
<trimmed>

And the PodCIDR range is correctly registering in iptables on mylinuxnodezoneb9

avinesh@mylinuxnodezoneb9:/$ /usr/sbin/ip route | grep "192.168.236.0"
192.168.236.0/26 via 192.168.236.8 dev vxlan.calico onlink 

And, as mentioned earlier, myapp-prod-6c94f965f8-6pvhg endpoint which is also running on the very same node is reachable from

k get pods -A --cluster=test -o wide | grep myapp-prod-6c94f965f8-6pvhg
app                                     myapp-prod-6c94f965f8-6pvhg                             1/1     Running                      0                  30d     192.168.230.189   mynodezonea1             <none>           <none>

avinesh@mylinuxnodezoneb9:/$ curl 192.168.230.189
avinesh@mylinuxnodezoneb9:/$ 

So it's not like node-to-node networking is completely broken. I do not use any networking policy, nor are there firewall restrictions at the host level.

Your Environment