kubernetes-sigs / sig-windows-tools

Repository for tools and artifacts related to the sig-windows charter in Kubernetes. Scripts to assist kubeadm and wincat and flannel will be hosted here.
Apache License 2.0
123 stars 123 forks source link

Kube-proxy not found after setup #362

Closed yahbouss closed 2 months ago

yahbouss commented 7 months ago

Describe the bug I am trying to create kubernetes cluster with 1 windows node and ubuntu node. the ubuntu node joined successfully and it's working but the windows is not. I ran the hack/DebugWindowsNode.ps1 and it showed me an issue with kube-proxy.exe not running as a service

here is my investigation till now:

> k get pods -n kube-system
NAME                              READY   STATUS             RESTARTS       AGE
coredns-5dd5756b68-f7v56          1/1     Running            0              2d18h
coredns-5dd5756b68-tnbst          1/1     Running            0              2d18h
etcd-ubuntu2                      1/1     Running            1              2d18h
kube-apiserver-ubuntu2            1/1     Running            3 (13h ago)    2d18h
kube-controller-manager-ubuntu2   1/1     Running            5 (13h ago)    2d18h
kube-proxy-42qwd                  1/1     Running            0              52m
kube-proxy-7nf29                  1/1     Running            0              52m
kube-proxy-windows-rrbl8          0/1     CrashLoopBackOff   13 (36s ago)   33m
kube-scheduler-ubuntu2            1/1     Running            5 (13h ago)    2d18h
metrics-server-98bc7f888-b4sv9    1/1     Running            0              22h
> kubectl describe pod kube-proxy-windows -n kube-system
Name:             kube-proxy-windows-rrbl8
Namespace:        kube-system
Priority:         0
Service Account:  kube-proxy
Node:             devops1/192.168.1.78
Start Time:       Thu, 08 Feb 2024 11:19:08 +0100
Labels:           controller-revision-hash=89b95b8fd
                  k8s-app=kube-proxy-windows
                  pod-template-generation=4
Annotations:      <none>
Status:           Running
IP:               192.168.1.78
IPs:
  IP:           192.168.1.78
Controlled By:  DaemonSet/kube-proxy-windows
Containers:
  kube-proxy:
    Container ID:   containerd://d274fbe620d3361d9ea56bc3289c3975b6375433f1f82764e6707a56f2220104
    Image:          sigwindowstools/kube-proxy:v1.28.0-flannel-hostprocess
    Image ID:       docker.io/sigwindowstools/kube-proxy@sha256:4c664d7b1e1354b0c5f74ece7b4b311832bb0b5df59efe2864d1fc63f7b8183f
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 08 Feb 2024 11:51:36 +0100
      Finished:     Thu, 08 Feb 2024 11:51:37 +0100
    Ready:          False
    Restart Count:  13
    Environment:
      KUBE_NETWORK:  flannel.4096
      CNI_BIN_PATH:  C:\\opt\\cni\\bin
      NODE_NAME:      (v1:spec.nodeName)
      POD_IP:         (v1:status.podIP)
    Mounts:
      /mounts/var/lib/kube-proxy from kube-proxy (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jzdk7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-proxy:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-proxy
    Optional:  false
  kube-api-access-jzdk7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=windows
Tolerations:                 op=Exists
                             CriticalAddonsOnly op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  36m                 default-scheduler  Successfully assigned kube-system/kube-proxy-windows-rrbl8 to devops1
  Normal   Pulled     36m                 kubelet            Successfully pulled image "sigwindowstools/kube-proxy:v1.28.0-flannel-hostprocess" in 1.479s (1.479s including waiting)
  Normal   Pulled     35m                 kubelet            Successfully pulled image "sigwindowstools/kube-proxy:v1.28.0-flannel-hostprocess" in 1.525s (1.525s including waiting)
  Normal   Pulling    35m (x3 over 36m)   kubelet            Pulling image "sigwindowstools/kube-proxy:v1.28.0-flannel-hostprocess"
  Normal   Created    35m (x3 over 36m)   kubelet            Created container kube-proxy
  Normal   Started    35m (x3 over 36m)   kubelet            Started container kube-proxy
  Normal   Pulled     35m                 kubelet            Successfully pulled image "sigwindowstools/kube-proxy:v1.28.0-flannel-hostprocess" in 1.517s (1.517s including waiting)
  Warning  BackOff    35m (x2 over 35m)   kubelet            Back-off restarting failed container kube-proxy in pod kube-proxy-windows-rrbl8_kube-system(211b5be2-6204-489f-a8ee-b3f0cc97b2a6)
  Normal   Pulled     35m                 kubelet            Successfully pulled image "sigwindowstools/kube-proxy:v1.28.0-flannel-hostprocess" in 1.473s (1.473s including waiting)
  Normal   Pulled     34m                 kubelet            Successfully pulled image "sigwindowstools/kube-proxy:v1.28.0-flannel-hostprocess" in 1.461s (1.461s including waiting)
  Normal   Pulled     34m                 kubelet            Successfully pulled image "sigwindowstools/kube-proxy:v1.28.0-flannel-hostprocess" in 1.504s (1.504s including waiting)
  Normal   Pulling    33m (x4 over 35m)   kubelet            Pulling image "sigwindowstools/kube-proxy:v1.28.0-flannel-hostprocess"
  Normal   Pulled     33m                 kubelet            Successfully pulled image "sigwindowstools/kube-proxy:v1.28.0-flannel-hostprocess" in 1.452s (1.452s including waiting)
  Normal   Started    33m (x4 over 35m)   kubelet            Started container kube-proxy
  Normal   Created    33m (x4 over 35m)   kubelet            Created container kube-proxy
  Warning  BackOff    7s (x157 over 35m)  kubelet            Back-off restarting failed container kube-proxy in pod kube-proxy-windows-rrbl8_kube-system(211b5be2-6204-489f-a8ee-b3f0cc97b2a6)
> kubectl describe node devops1
Name:               devops1
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=windows
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=devops1
                    kubernetes.io/os=windows
                    node-role.kubernetes.io/worker=win-worker
                    node.kubernetes.io/windows-build=10.0.20348
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: npipe:////./pipe/containerd-containerd
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 07 Feb 2024 18:07:36 +0100
Taints:             node.kubernetes.io/not-ready:NoExecute
                    node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  devops1
  AcquireTime:     <unset>
  RenewTime:       Thu, 08 Feb 2024 11:56:41 +0100
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 08 Feb 2024 11:55:40 +0100   Thu, 08 Feb 2024 09:36:50 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 08 Feb 2024 11:55:40 +0100   Thu, 08 Feb 2024 09:36:50 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 08 Feb 2024 11:55:40 +0100   Thu, 08 Feb 2024 09:36:50 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Thu, 08 Feb 2024 11:55:40 +0100   Thu, 08 Feb 2024 09:36:50 +0100   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
  InternalIP:  192.168.1.78
  Hostname:    devops1
Capacity:
  cpu:                8
  ephemeral-storage:  487730172Ki
  memory:             16658320Ki
  pods:               110
Allocatable:
  cpu:                8
  ephemeral-storage:  449492125771
  memory:             16555920Ki
  pods:               110
System Info:
  Machine ID:                 DevOps1
  System UUID:                4C4C4544-0031-5610-8046-B5C04F574C32
  Boot ID:                    11
  Kernel Version:             10.0.20348.350
  OS Image:                   Windows Server 2022 Datacenter
  Operating System:           windows
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.1
  Kubelet Version:            v1.28.0
  Kube-Proxy Version:         v1.28.0
PodCIDR:                      10.244.2.0/24
PodCIDRs:                     10.244.2.0/24
Non-terminated Pods:          (1 in total)
  Namespace                   Name                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                        ------------  ----------  ---------------  -------------  ---
  kube-system                 kube-proxy-windows-rrbl8    0 (0%)        0 (0%)      0 (0%)           0 (0%)         37m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests  Limits
  --------           --------  ------
  cpu                0 (0%)    0 (0%)
  memory             0 (0%)    0 (0%)
  ephemeral-storage  0 (0%)    0 (0%)
Events:
  Type    Reason                   Age   From     Message
  ----    ------                   ----  ----     -------
  Normal  Starting                 36m   kubelet  Starting kubelet.
  Normal  NodeHasSufficientMemory  36m   kubelet  Node devops1 status is now: NodeHasSufficientMemory

To Reproduce Steps to reproduce the behavior: I ran :

  1. ./Install-Containerd.ps1
  2. ./PrepareNode.ps1
  3. Kubeadm join ...
  4. curl.exe -LO https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/calico/kube-proxy/kube-proxy.yml
  5. (Get-Content "kube-proxy.yml") -Replace 'image: (.):(.)-(.)-(.)$', 'image: $1:v1.28.0-$3-$4' | Set-Content "kube-proxy.yml"
  6. kubectl apply -f kube-proxy.yml (after copying it to master node)

Expected behavior win-node should be ready, instead it is not ready

Kubernetes (please complete the following information):

Additional context When nothing worked, i tried changing the version of the kube-proxy, reverting it back, checking the logs of the containerd in the windows, and this is the log:

2024-02-08T11:56:46.5215791+01:00 stdout F Write files so the kubeconfig points to correct locations
2024-02-08T11:56:46.5807522+01:00 stdout F 
2024-02-08T11:56:46.5808445+01:00 stdout F 
2024-02-08T11:56:46.581363+01:00 stdout F     Directory: C:\var\lib
2024-02-08T11:56:46.581363+01:00 stdout F 
2024-02-08T11:56:46.581363+01:00 stdout F 
2024-02-08T11:56:46.5825361+01:00 stdout F Mode                 LastWriteTime         Length Name                                                                 
2024-02-08T11:56:46.5830454+01:00 stdout F ----                 -------------         ------ ----                                                                 
2024-02-08T11:56:46.5830454+01:00 stdout F d-----        08/02/2024     09:52                kube-proxy                                                           
2024-02-08T11:56:46.5963987+01:00 stdout F Finding sourcevip
2024-02-08T11:56:47.0771863+01:00 stderr F Cannot index into a null array.
2024-02-08T11:56:47.0771863+01:00 stderr F At C:\hpc\kube-proxy\start.ps1:19 char:9
2024-02-08T11:56:47.0771863+01:00 stderr F +         $subnet = $hnsNetwork.Subnets[0].AddressPrefix
2024-02-08T11:56:47.0771863+01:00 stderr F +         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2024-02-08T11:56:47.0771863+01:00 stderr F     + CategoryInfo          : InvalidOperation: (:) [], ParentContainsErrorRecordException
2024-02-08T11:56:47.0771863+01:00 stderr F     + FullyQualifiedErrorId : NullArray
2024-02-08T11:56:47.0771863+01:00 stderr F  
jsturtevant commented 7 months ago

Can you try building the image on your own? We have not built every version of the flannel images

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/sig-windows-tools/issues/362#issuecomment-2212524604): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.