Uncomment only one and also add the corresponding label in the PR:
bug
unit-test
What this PR does / why we need it:
DirectAccessVolumes need not to be considered while prioritizing nodes during scheduling.
Also if px becomes offline, the pods using only FADA volumes need not be deleted as part of health monitoring.
Does this PR change a user-facing CRD or CLI?:
no
Is a release note needed?:
Issue: Pods used to show an event `Unable to schedule pod using volumes [*] in a hyperconverged fashion` when using direct access volumes.
User Impact: Pods used to get the event and also get deleted if it's scheduled on storage node which went offline.
Resolution: Node prioritizing will be skipped for pods using directaccess volume and will not get evicted if it's scheduled on a node for which storage goes offline.
Does this change need to be cherry-picked to a release branch?:
yes, 24.4.0
Tests:
Wrt scheduling
The event like "time="2024-04-24T16:07:22Z" level=warning msg="Unable to schedule pod using volumes [pvc-86da355f-f0cc-4a94-8245-97d3ac46ccbc] in a hyperconverged fashion. Make sure you have enough CPU and memory resources available on these nodes: []" Namespace=fada-namespace-141 Owner=ReplicaSet/px-pure-block04-24-16h00m59s-141-87bc9cf59 PodName=px-pure-block04-24-16h00m59s-141-87bc9cf59-c8vwp
" is no more seen
➜ ~ kubectl -n busy-0 get po
NAME READY STATUS RESTARTS AGE
busybox-deployment-5d7db775bb-hbfg7 0/1 ContainerCreating 0 14s
➜ ~ kubectl -n busy-0 get po
NAME READY STATUS RESTARTS AGE
busybox-deployment-5d7db775bb-hbfg7 1/1 Running 0 20s
➜ ~ kubectl -n busy-0 describe po busybox-deployment-5d7db775bb-hbfg7
Name: busybox-deployment-5d7db775bb-hbfg7
Namespace: busy-0
Priority: 0
Node: ip-10-13-230-144.pwx.purestorage.com/10.13.230.144
Start Time: Fri, 18 Oct 2024 14:35:25 +0000
Labels: app=busybox
pod-template-hash=5d7db775bb
Annotations: cni.projectcalico.org/containerID: f48516a66360e1b3b276b87305de58c22db3f778856a8486d37d7eec64a258b3
cni.projectcalico.org/podIP: 10.233.80.79/32
cni.projectcalico.org/podIPs: 10.233.80.79/32
Status: Running
IP: 10.233.80.79
IPs:
IP: 10.233.80.79
Controlled By: ReplicaSet/busybox-deployment-5d7db775bb
Containers:
busybox:
Container ID: docker://5c83d8f9bf904065647cb8c62e3a2288b6fbb71d798acad408e75f27095086a4
Image: busybox
Image ID: docker-pullable://busybox@sha256:768e5c6f5cb6db0794eec98dc7a967f40631746c32232b78a3105fb946f3ab83
Port:
Host Port:
Args:
/bin/sh
-c
while true; do date >> /data/out.txt; sleep 10; done
State: Running
Started: Fri, 18 Oct 2024 14:35:41 +0000
Ready: True
Restart Count: 0
Environment:
Mounts:
/data from data-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zztzg (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
data-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: busybox-pvc
ReadOnly: false
kube-api-access-zztzg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Warning FailedScheduling 28s stork 0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Warning FailedScheduling 26s stork 0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Normal Scheduled 24s stork Successfully assigned busy-0/busybox-deployment-5d7db775bb-hbfg7 to ip-10-13-230-144.pwx.purestorage.com
Normal Pulling 10s kubelet Pulling image "busybox"
Normal Pulled 9s kubelet Successfully pulled image "busybox" in 659ms (659ms including waiting). Image size: 4269694 bytes.
Normal Created 9s kubelet Created container busybox
Normal Started 9s kubelet Started container busybox
Signed-Off-By: Diptiranjan
What type of PR is this?
What this PR does / why we need it:
Does this PR change a user-facing CRD or CLI?: no
Is a release note needed?:
Does this change need to be cherry-picked to a release branch?: yes, 24.4.0
Tests:
➜ ~ kubectl -n busy-0 get po NAME READY STATUS RESTARTS AGE busybox-deployment-5d7db775bb-hbfg7 0/1 ContainerCreating 0 14s ➜ ~ kubectl -n busy-0 get po NAME READY STATUS RESTARTS AGE busybox-deployment-5d7db775bb-hbfg7 1/1 Running 0 20s ➜ ~ kubectl -n busy-0 describe po busybox-deployment-5d7db775bb-hbfg7 Name: busybox-deployment-5d7db775bb-hbfg7 Namespace: busy-0 Priority: 0 Node: ip-10-13-230-144.pwx.purestorage.com/10.13.230.144 Start Time: Fri, 18 Oct 2024 14:35:25 +0000 Labels: app=busybox pod-template-hash=5d7db775bb Annotations: cni.projectcalico.org/containerID: f48516a66360e1b3b276b87305de58c22db3f778856a8486d37d7eec64a258b3 cni.projectcalico.org/podIP: 10.233.80.79/32 cni.projectcalico.org/podIPs: 10.233.80.79/32 Status: Running IP: 10.233.80.79 IPs: IP: 10.233.80.79 Controlled By: ReplicaSet/busybox-deployment-5d7db775bb Containers: busybox: Container ID: docker://5c83d8f9bf904065647cb8c62e3a2288b6fbb71d798acad408e75f27095086a4 Image: busybox Image ID: docker-pullable://busybox@sha256:768e5c6f5cb6db0794eec98dc7a967f40631746c32232b78a3105fb946f3ab83 Port:
Host Port:
Args:
/bin/sh
-c
while true; do date >> /data/out.txt; sleep 10; done
State: Running
Started: Fri, 18 Oct 2024 14:35:41 +0000
Ready: True
Restart Count: 0
Environment:
Mounts:
/data from data-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zztzg (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
data-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: busybox-pvc
ReadOnly: false
kube-api-access-zztzg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Warning FailedScheduling 28s stork 0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling. Warning FailedScheduling 26s stork 0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling. Normal Scheduled 24s stork Successfully assigned busy-0/busybox-deployment-5d7db775bb-hbfg7 to ip-10-13-230-144.pwx.purestorage.com Normal Pulling 10s kubelet Pulling image "busybox" Normal Pulled 9s kubelet Successfully pulled image "busybox" in 659ms (659ms including waiting). Image size: 4269694 bytes. Normal Created 9s kubelet Created container busybox Normal Started 9s kubelet Started container busybox
Logs wrt scoring:
time="2024-10-18T14:35:25Z" level=debug msg="Skipping volume pvc-2c97acb9-6615-44f1-9b2f-87bc1f225e33 from scoring" Namespace=busy-0 Owner=ReplicaSet/busybox-deployment-5d7db775bb PodName=busybox-deployment-5d7db775bb-hbfg7 time="2024-10-18T14:35:25Z" level=debug msg="Nodes in response:" Namespace=busy-0 Owner=ReplicaSet/busybox-deployment-5d7db775bb PodName=busybox-deployment-5d7db775bb-hbfg7 time="2024-10-18T14:35:25Z" level=debug msg="{Host:ip-10-13-226-123.pwx.purestorage.com Score:5}" Namespace=busy-0 Owner=ReplicaSet/busybox-deployment-5d7db775bb PodName=busybox-deployment-5d7db775bb-hbfg7 time="2024-10-18T14:35:25Z" level=debug msg="{Host:ip-10-13-230-144.pwx.purestorage.com Score:5}" Namespace=busy-0 Owner=ReplicaSet/busybox-deployment-5d7db775bb PodName=busybox-deployment-5d7db775bb-hbfg7 time="2024-10-18T14:35:25Z" level=debug msg="{Host:ip-10-13-227-115.pwx.purestorage.com Score:5}" Namespace=busy-0 Owner=ReplicaSet/busybox-deployment-5d7db775bb PodName=busybox-deployment-5d7db775bb-hbfg7
➜ ~ kp get po -lname=stork ➜ ~ kubectl -n busy-0 get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox-deployment-5d7db775bb-hbfg7 1/1 Running 0 2m25s 10.233.80.79 ip-10-13-230-144.pwx.purestorage.com
➜ ~ kp get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-13-226-123.pwx.purestorage.com Ready 10d v1.30.0 10.13.226.123 Ubuntu 20.04.2 LTS 5.4.0-107-generic docker://26.1.2
ip-10-13-227-115.pwx.purestorage.com Ready 10d v1.30.0 10.13.227.115 Ubuntu 20.04.2 LTS 5.4.0-107-generic docker://26.1.2
ip-10-13-228-109.pwx.purestorage.com Ready control-plane 10d v1.30.0 10.13.228.109 Ubuntu 20.04.2 LTS 5.4.0-107-generic docker://26.1.2
ip-10-13-230-144.pwx.purestorage.com Ready 10d v1.30.0 10.13.230.144 Ubuntu 20.04.2 LTS 5.4.0-107-generic docker://26.1.2
➜ ~ kubectl label node ip-10-13-230-144.pwx.purestorage.com px/service=stop
node/ip-10-13-230-144.pwx.purestorage.com labeled
➜ ~ kubectl -n busy-0 get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox-deployment-5d7db775bb-hbfg7 1/1 Running 0 8m13s 10.233.80.79 ip-10-13-230-144.pwx.purestorage.com
➜ ~ kubectl -n busy-0 get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox-deployment-5d7db775bb-hbfg7 1/1 Running 0 17m 10.233.80.79 ip-10-13-230-144.pwx.purestorage.com