At least the web pod and worker beat pod are using emptyDir volumes. This will consume ephemeral-storage on the node that the pod gets scheduled on. Since we have not specified any resource requests and limits for ephemeral storage for the container, we risk that the pod gets evicted and/or crashes and/or causes resource exhaustion on the node.
Currently, my pods get evicted and I get a warning when the pod gets scheduled on a node with too little ephemeral storage available:
$ kubectl get events --field-selector involvedObject.name=worker-beat-7898d974fc-sb9xz 130 ↵
LAST SEEN TYPE REASON OBJECT MESSAGE
46m Warning FailedScheduling pod/worker-beat-7898d974fc-sb9xz 0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 No preemption victims found for incoming pod..
46m Warning FailedScheduling pod/worker-beat-7898d974fc-sb9xz 0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 No preemption victims found for incoming pod..
45m Normal Scheduled pod/worker-beat-7898d974fc-sb9xz Successfully assigned invenio-dev/worker-beat-7898d974fc-sb9xz to kth-prod-1-worker-7a7516d2-v8vbc
45m Normal SuccessfulAttachVolume pod/worker-beat-7898d974fc-sb9xz AttachVolume.Attach succeeded for volume "pvc-801c874c-37a9-4520-a0e8-c59606c9d09a"
45m Normal Pulling pod/worker-beat-7898d974fc-sb9xz Pulling image "ghcr.io/inveniosoftware/demo-inveniordm/demo-inveniordm@sha256:2193abc2caec9bc599061d6a5874fd2d7d201f55d1673a545af0a0406690e8a4"
44m Warning Evicted pod/worker-beat-7898d974fc-sb9xz The node was low on resource: ephemeral-storage. Threshold quantity: 994154920, available: 759960Ki.
44m Normal Pulled pod/worker-beat-7898d974fc-sb9xz Successfully pulled image "ghcr.io/inveniosoftware/demo-inveniordm/demo-inveniordm@sha256:2193abc2caec9bc599061d6a5874fd2d7d201f55d1673a545af0a0406690e8a4" in 1m2.20910036s (1m2.209116986s including waiting)
44m Normal Created pod/worker-beat-7898d974fc-sb9xz Created container worker-beat
44m Normal Started pod/worker-beat-7898d974fc-sb9xz Started container worker-beat
44m Normal Killing pod/worker-beat-7898d974fc-sb9xz Stopping container worker-beat
44m Warning ExceededGracePeriod pod/worker-beat-7898d974fc-sb9xz Container runtime did not kill the pod within specified grace period.
I suggest we add resource limits and requests for ephemeral-storage on all containers that use emptyDir. I can whip up a PR for it, but I need your help to identify a reasonable size to set as request and limit.
At least the web pod and worker beat pod are using
emptyDir
volumes. This will consumeephemeral-storage
on the node that the pod gets scheduled on. Since we have not specified any resource requests and limits for ephemeral storage for the container, we risk that the pod gets evicted and/or crashes and/or causes resource exhaustion on the node.Currently, my pods get evicted and I get a warning when the pod gets scheduled on a node with too little ephemeral storage available:
I suggest we add resource limits and requests for
ephemeral-storage
on all containers that useemptyDir
. I can whip up a PR for it, but I need your help to identify a reasonable size to set as request and limit.