Terminating Windows pods hangs on FailedKillPod: ... hcnDeleteNamespace: ... The specified request is unsupported

doctorpangloss commented 2 years ago

Terminating pods during regular deployment scale down on a Windows Calico worker gets stuck on an unusual error.

Expected Behavior

The pods should cleanly terminate. They sometimes do.

Current Behavior

Events:
  Type     Reason         Age                     From     Message
  ----     ------         ----                    ----     -------
  Warning  FailedKillPod  3m37s (x6055 over 22h)  kubelet  error killing pod: failed to "KillPodSandbox" for "241a6b05-c158-44c3-b027-b6bcb6ab4d0e" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to remove network namespace for sandbox \"5d496ce20db3ecbc8e524531ab6e093936afc1392ed156bffb337b3cd36d9896\": hcnDeleteNamespace failed in Win32: The specified request is unsupported. (0x803b0015) {\"Success\":false,\"Error\":\"The specified request is unsupported. \",\"ErrorCode\":2151350293}"

The underlying process appears to have successfully exited.

Possible Solution

Not sure.

Steps to Reproduce (for bugs)

Challenging to reproduce. One note is that I run the core process as a Windows service in the Windows container. This may be interacting to cause a race condition.

Context

I experience a lot of issues using containers on Windows, and I am sophisticated, so I am not sure if Calico is strictly to blame here.

Your Environment

Calico version

C:/CalicoWindows # calicoctl version
Client Version:    v3.23.1
Git commit:        967e24543
Cluster Version:   v3.23.1
Cluster Type:      k8s,kdd,typha,operator,ecs,win

Orchestrator version (e.g. kubernetes, mesos, rkt):

C:/CalicoWindows # kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.6", GitCommit:"f59f5c2fda36e4036b49ec027e556a15456108f0", GitTreeState:"clean", BuildDate:"2022-01-19T17:33:06Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.15-eks-fb459a0", GitCommit:"be82fa628e60d024275efaa239bfe53a9119c2d9", GitTreeState:"clean", BuildDate:"2022-10-24T20:33:23Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

(1.22 on AWS EKS + on premises nodes)

Operating System and version: Windows 2022 LTSC

coutinhop commented 1 year ago

This looks like it might be the same issue as https://github.com/projectcalico/calico/issues/5828, which was fixed by this PR: https://github.com/projectcalico/calico/pull/6656 Unfortunately, it hasn't made into any release just yet, it is scheduled for Calico v3.25

coutinhop commented 1 year ago

Closing as fixed by https://github.com/projectcalico/calico/pull/6656

doctorpangloss commented 4 months ago

This issue persists with v3.26.4 which should include this change.

coutinhop commented 3 months ago

I think the issue now is that we no longer use https://github.com/projectcalico/calico/blob/master/node/windows-packaging/CalicoWindows/kubernetes/kubelet-service.ps1 for running kubelet, but rather it is installed by this script maintained by the kubernetes sig-windows group: https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/hostprocess/PrepareNode.ps1

If I'm understanding things correctly, this is a problem with Windows containerization, in that it doesn't prioritize calico pods on reboots (at least the original issue would happen on node reboot, @doctorpangloss could you please try to provide more details as to when this is being hit?). Not sure there's much we can do as a workaround, but you could also try to replace the kubelet service with the one from the calico repo (and see if the inclusion of Wait-ForCalicoInit() does solve it for you).

coutinhop commented 2 months ago

@doctorpangloss were you able to test this?

doctorpangloss commented 2 months ago

Because I use hostprocess containers to deploy Calico, what would be the most sensible way to use Wait-ForCalicoInit ?

Configure descheduler to remove pods from nodes based on taints
A cluster-wide controller observes Windows nodes
Windows nodes that are going to use calico should be tainted on startup - how would I do this?
Remove the taint when the daemon sees that calico is running?

doctorpangloss commented 2 months ago

The issue still occurs for 3.26.4, kubernetes 1.29.7:

{
  "namespace": "...",
  "podName": "dinkydiner-unity-deployment-85d8f5bcbc-wg44f",
  "reason": "FailedKillPod",
  "message": "error killing pod: failed to \"KillPodSandbox\" for \"206c8fee-4106-4f8e-b6a6-7bdcbba9bdd6\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to remove network namespace for sandbox \\\"2e9f06d178440fa214f9bf529a05fd91b3017e03d506ef34b08891d6541e708f\\\": hcnDeleteNamespace failed in Win32: The specified request is unsupported. (0x803b0015) {\\\"Success\\\":false,\\\"Error\\\":\\\"The specified request is unsupported. \\\",\\\"ErrorCode\\\":2151350293}\"",
  "count": 6493,
  "lastTimestamp": "2024-09-10T17:33:14Z"
}

I still see this issue

$ kubectl describe pods -n ...  dinkydiner-unity-deployment-85d8f5bcbc-rmrpg
Name:                      dinkydiner-unity-deployment-85d8f5bcbc-rmrpg
Namespace:                 ...
Priority:                  0
Service Account:           default
Node:                      .../...
Start Time:                Fri, 30 Aug 2024 18:08:14 -0700
Labels:                    app.kubernetes.io/instance=dinkydiner-unity-deployment
                           app.kubernetes.io/name=unity-deployment
                           pod-template-hash=85d8f5bcbc
Annotations:               appmana.artifactId: .../dinkydiner
                           appmana.project: dinkydiner
                           cni.projectcalico.org/containerID: fac9a792aa7f303f8e9225d493d28335bca9e8445e62b119684fef9b591cf090
                           cni.projectcalico.org/podIP:
                           cni.projectcalico.org/podIPs:
Status:                    Terminating (lasts 24h)
Termination Grace Period:  30s
IP:                        10.3.245.169
IPs:
  IP:           10.3.245.169
Controlled By:  ReplicaSet/dinkydiner-unity-deployment-85d8f5bcbc
Containers:
  dinkydiner-unity-deployment:
    Container ID:  containerd://bc3286c5a71bc3eb73357e51846eb06471fd0b0f23bd46bb6f2ea33ff081ad14
    Image:         ...
    State:          Terminated
      Reason:       Error
      Exit Code:    -1073741510
      Started:      Sun, 08 Sep 2024 15:29:46 -0700
      Finished:     Mon, 09 Sep 2024 10:23:21 -0700
    Ready:          False
    Restart Count:  2
    Limits:
      microsoft.com/directx:  1
    Requests:
      cpu:                    2
      memory:                 1000Mi
      microsoft.com/directx:  1
    Environment:              <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6kz8c (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   False
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
...
QoS Class:                    Burstable
Node-Selectors:               kubernetes.io/arch=amd64
                              kubernetes.io/os=windows
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/os=windows:NoSchedule
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
                              nvidia.com/gpu:NoSchedule op=Exists
                              nvidia.com/gpu=present:NoSchedule
Topology Spread Constraints:  kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/instance=dinkydiner-unity-deployment,app.kubernetes.io/name=unity-deployment
Events:
  Type     Reason         Age                  From     Message
  ----     ------         ----                 ----     -------
  Warning  FailedKillPod  3s (x6596 over 24h)  kubelet  error killing pod: failed to "KillPodSandbox" for "dcf6df12-53af-4b8f-8354-57df98d673f2" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to remove network namespace for sandbox \"fac9a792aa7f303f8e9225d493d28335bca9e8445e62b119684fef9b591cf090\": hcnDeleteNamespace failed in Win32: The specified request is unsupported. (0x803b0015) {\"Success\":false,\"Error\":\"The specified request is unsupported. \",\"ErrorCode\":2151350293}"

However, the task is definitely not running:

$ ctr -n k8s.io t ls | grep bc3286c5a71bc3eb73357e51846eb06471fd0b0f23bd46bb6f2ea33ff081ad14

(it's empty)

Would it make more sense to author a descheduler policy that force deletes pods with this error?

coutinhop commented 1 month ago

Because I use hostprocess containers to deploy Calico, what would be the most sensible way to use Wait-ForCalicoInit?

I meant to stop and remove the kubelet service (if installed from https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/hostprocess/PrepareNode.ps1), then install it running this script: https://github.com/projectcalico/calico/blob/master/node/windows-packaging/CalicoWindows/kubernetes/install-kube-services.ps1 (with a caveat that it may be outdated, appreciate feedback if you find issues). All of this needs to be done on the host, as that's where kubelet runs.

Though I'm not sure if that would in fact solve this problem.

Would it make more sense to author a descheduler policy that force deletes pods with this error?

Could you elaborate? My k8s "noobness" might be showing, but is that something possible to do via configuration? Or did you mean for us to write such a tool? Just asking to understand better where to begin...

coutinhop commented 3 weeks ago

@doctorpangloss a ping about ^

doctorpangloss commented 3 weeks ago

@coutinhop

Could you elaborate?

I have a CronJob that finds pods suffering from this error and force terminates them:

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: pod-cleanup-job
spec:
  schedule: "*/2 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: ...
          nodeSelector:
            kubernetes.io/os: linux
          containers:
            - name: kubectl
              image: bitnami/kubectl:latest
              command:
                - /bin/sh
                - -c
                - |
                  # calico 3.26 fixes
                  kubectl get pods --all-namespaces -o json | jq -r '
                    .items[] | 
                    select(.metadata.deletionTimestamp != null) |
                    select(
                      .status.phase == "Running" or 
                      .status.phase == "Failed" or 
                      (.status.phase == "Terminating" and (now - (.metadata.deletionTimestamp | fromdateiso8601)) > 3600)
                    ) |
                    select(
                      (.status.containerStatuses[0].state.terminated.reason == "Error" and .status.containerStatuses[0].state.terminated.exitCode == -1073741510) or
                      (.status.containerStatuses[0].state.terminated.reason == "StartError" and .status.containerStatuses[0].state.terminated.exitCode == 128) or
                      .status.containerStatuses[0].state.waiting.reason == "ContainerCreating"
                    ) |
                    "\(.metadata.namespace) \(.metadata.name)"
                  ' | while read namespace pod; do
                    if kubectl get events --field-selector involvedObject.name=$pod -n $namespace | grep -q 'FailedKillPod.*hcnDeleteNamespace.*The specified request is unsupported'; then
                      echo "Forcefully deleting pod $pod in namespace $namespace due to FailedKillPod"
                      kubectl delete pod $pod -n $namespace --force --grace-period=0
                    elif kubectl get events --field-selector involvedObject.name=$pod -n $namespace | grep -q 'StartError.*failed to create containerd task: failed to create shim task: hcs::CreateComputeSystem.*The endpoint was not found'; then
                      echo "Forcefully deleting pod $pod in namespace $namespace due to StartError"
                      kubectl delete pod $pod -n $namespace --force --grace-period=0
                    elif [ "$(kubectl get pod $pod -n $namespace -o jsonpath='{.status.phase}')" == "Terminating" ] && [ "$(kubectl get pod $pod -n $namespace -o jsonpath='{.status.containerStatuses[0].state.waiting.reason}')" == "ContainerCreating" ]; then
                      echo "Forcefully deleting pod $pod in namespace $namespace due to stuck in Terminating state"
                      kubectl delete pod $pod -n $namespace --force --grace-period=0
                    fi
                  done

I wish I comprehended what the error was actually saying or what was going wrong.

coutinhop commented 1 week ago

Thanks @doctorpangloss! If possible, could you try this procedure I mentioned:

I meant to stop and remove the kubelet service (if installed from https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/hostprocess/PrepareNode.ps1), then install it running this script: https://github.com/projectcalico/calico/blob/master/node/windows-packaging/CalicoWindows/kubernetes/install-kube-services.ps1 (with a caveat that it may be outdated, appreciate feedback if you find issues). All of this needs to be done on the host, as that's where kubelet runs.

This can at least help us narrow down a root cause, if it fixes things...

In the meantime, I can look into having some similar clean up added to calico...

projectcalico / calico