libopenstorage / stork

Stork - Storage Orchestration Runtime for Kubernetes
Apache License 2.0
392 stars 89 forks source link

PWX-37027: DirectAccessVolumes need not to be considered while prioritizing nodes during scheduling. #1867

Closed diptiranjanpx closed 3 weeks ago

diptiranjanpx commented 1 month ago

Signed-Off-By: Diptiranjan

What type of PR is this?

Uncomment only one and also add the corresponding label in the PR: bug unit-test

What this PR does / why we need it:

  1. DirectAccessVolumes need not to be considered while prioritizing nodes during scheduling.
  2. Also if px becomes offline, the pods using only FADA volumes need not be deleted as part of health monitoring.

Does this PR change a user-facing CRD or CLI?: no

Is a release note needed?:

Issue: Pods used to show an event `Unable to schedule pod using volumes [*] in a hyperconverged fashion` when using direct access volumes.
User Impact: Pods used to get the event and also get deleted if it's scheduled on storage node which went offline.
Resolution: Node prioritizing will be skipped for pods using directaccess volume and will not get evicted if it's scheduled on a node for which storage goes offline.

Does this change need to be cherry-picked to a release branch?: yes, 24.4.0

Tests:

  1. Wrt scheduling
  2. The event like "time="2024-04-24T16:07:22Z" level=warning msg="Unable to schedule pod using volumes [pvc-86da355f-f0cc-4a94-8245-97d3ac46ccbc] in a hyperconverged fashion. Make sure you have enough CPU and memory resources available on these nodes: []" Namespace=fada-namespace-141 Owner=ReplicaSet/px-pure-block04-24-16h00m59s-141-87bc9cf59 PodName=px-pure-block04-24-16h00m59s-141-87bc9cf59-c8vwp " is no more seen

➜ ~ kubectl -n busy-0 get po NAME READY STATUS RESTARTS AGE busybox-deployment-5d7db775bb-hbfg7 0/1 ContainerCreating 0 14s ➜ ~ kubectl -n busy-0 get po NAME READY STATUS RESTARTS AGE busybox-deployment-5d7db775bb-hbfg7 1/1 Running 0 20s ➜ ~ kubectl -n busy-0 describe po busybox-deployment-5d7db775bb-hbfg7 Name: busybox-deployment-5d7db775bb-hbfg7 Namespace: busy-0 Priority: 0 Node: ip-10-13-230-144.pwx.purestorage.com/10.13.230.144 Start Time: Fri, 18 Oct 2024 14:35:25 +0000 Labels: app=busybox pod-template-hash=5d7db775bb Annotations: cni.projectcalico.org/containerID: f48516a66360e1b3b276b87305de58c22db3f778856a8486d37d7eec64a258b3 cni.projectcalico.org/podIP: 10.233.80.79/32 cni.projectcalico.org/podIPs: 10.233.80.79/32 Status: Running IP: 10.233.80.79 IPs: IP: 10.233.80.79 Controlled By: ReplicaSet/busybox-deployment-5d7db775bb Containers: busybox: Container ID: docker://5c83d8f9bf904065647cb8c62e3a2288b6fbb71d798acad408e75f27095086a4 Image: busybox Image ID: docker-pullable://busybox@sha256:768e5c6f5cb6db0794eec98dc7a967f40631746c32232b78a3105fb946f3ab83 Port: Host Port: Args: /bin/sh -c while true; do date >> /data/out.txt; sleep 10; done State: Running Started: Fri, 18 Oct 2024 14:35:41 +0000 Ready: True Restart Count: 0 Environment: Mounts: /data from data-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zztzg (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready True ContainersReady True PodScheduled True Volumes: data-volume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: busybox-pvc ReadOnly: false kube-api-access-zztzg: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Warning FailedScheduling 28s stork 0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling. Warning FailedScheduling 26s stork 0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling. Normal Scheduled 24s stork Successfully assigned busy-0/busybox-deployment-5d7db775bb-hbfg7 to ip-10-13-230-144.pwx.purestorage.com Normal Pulling 10s kubelet Pulling image "busybox" Normal Pulled 9s kubelet Successfully pulled image "busybox" in 659ms (659ms including waiting). Image size: 4269694 bytes. Normal Created 9s kubelet Created container busybox Normal Started 9s kubelet Started container busybox

Logs wrt scoring:

time="2024-10-18T14:35:25Z" level=debug msg="Skipping volume pvc-2c97acb9-6615-44f1-9b2f-87bc1f225e33 from scoring" Namespace=busy-0 Owner=ReplicaSet/busybox-deployment-5d7db775bb PodName=busybox-deployment-5d7db775bb-hbfg7 time="2024-10-18T14:35:25Z" level=debug msg="Nodes in response:" Namespace=busy-0 Owner=ReplicaSet/busybox-deployment-5d7db775bb PodName=busybox-deployment-5d7db775bb-hbfg7 time="2024-10-18T14:35:25Z" level=debug msg="{Host:ip-10-13-226-123.pwx.purestorage.com Score:5}" Namespace=busy-0 Owner=ReplicaSet/busybox-deployment-5d7db775bb PodName=busybox-deployment-5d7db775bb-hbfg7 time="2024-10-18T14:35:25Z" level=debug msg="{Host:ip-10-13-230-144.pwx.purestorage.com Score:5}" Namespace=busy-0 Owner=ReplicaSet/busybox-deployment-5d7db775bb PodName=busybox-deployment-5d7db775bb-hbfg7 time="2024-10-18T14:35:25Z" level=debug msg="{Host:ip-10-13-227-115.pwx.purestorage.com Score:5}" Namespace=busy-0 Owner=ReplicaSet/busybox-deployment-5d7db775bb PodName=busybox-deployment-5d7db775bb-hbfg7


2. Wrt health monitor, this pod does not get evicted and will keep on running on the node where px goes offline.

➜ ~ kp get po -lname=stork ➜ ~ kubectl -n busy-0 get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox-deployment-5d7db775bb-hbfg7 1/1 Running 0 2m25s 10.233.80.79 ip-10-13-230-144.pwx.purestorage.com ➜ ~ kp get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-13-226-123.pwx.purestorage.com Ready 10d v1.30.0 10.13.226.123 Ubuntu 20.04.2 LTS 5.4.0-107-generic docker://26.1.2 ip-10-13-227-115.pwx.purestorage.com Ready 10d v1.30.0 10.13.227.115 Ubuntu 20.04.2 LTS 5.4.0-107-generic docker://26.1.2 ip-10-13-228-109.pwx.purestorage.com Ready control-plane 10d v1.30.0 10.13.228.109 Ubuntu 20.04.2 LTS 5.4.0-107-generic docker://26.1.2 ip-10-13-230-144.pwx.purestorage.com Ready 10d v1.30.0 10.13.230.144 Ubuntu 20.04.2 LTS 5.4.0-107-generic docker://26.1.2 ➜ ~ kubectl label node ip-10-13-230-144.pwx.purestorage.com px/service=stop node/ip-10-13-230-144.pwx.purestorage.com labeled ➜ ~ kubectl -n busy-0 get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox-deployment-5d7db775bb-hbfg7 1/1 Running 0 8m13s 10.233.80.79 ip-10-13-230-144.pwx.purestorage.com ➜ ~ kubectl -n busy-0 get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox-deployment-5d7db775bb-hbfg7 1/1 Running 0 17m 10.233.80.79 ip-10-13-230-144.pwx.purestorage.com

strivedi-px commented 4 weeks ago

Can't access the CBTs so maybe we can check what's failing once the services are back up.