nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
1 stars 0 forks source link

What should the minimum cpu or memory request be? #602

Open naved001 opened 3 weeks ago

naved001 commented 3 weeks ago

All user namespaces have a limitrange that set default limits and requests for cpu and memory if the pod spec does not specify it.

However, one could specifically request 0 cpu and/or memory and the pod will be scheduled. This means that for billing purposes it will not be captured because the pod requested 0 resources; but in reality as long as the cluster has resources it'll continue to run and consume up to the limit specified (either in pod spec or from the default limitrange).

This came up when I was examining a pod that wasn't captured in the invoice with the following specs:

    resources:
      limits:
        cpu: 500m
        memory: 512Mi
        nvidia.com/gpu: "0"
      requests:
        memory: "0"
        nvidia.com/gpu: "0"

Once the cluster has higher utilization then it may become a moot point because a pod like this would be the first to get kicked out. I believe this user did this by mistake and maybe we should reach out to them? They had hundreds of pods in state "UnexpectedAdmissionError" before they got a running one.

naved001 commented 3 weeks ago

if we wanted to do this then we could update the limitranges for the namespace to specify the minimum cpu and memory:

https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-constraint-namespace/#create-a-limitrange-and-a-pod

naved001 commented 3 weeks ago

I created a limit range that specifies the minimum size for a pod:

naved@computer ~ %      oc get limitrange -n naved-test -o yaml | yq '.items[].spec'
{
  "limits": [
    {
      "default": {
        "cpu": "2",
        "memory": "1Gi"
      },
      "defaultRequest": {
        "cpu": "1",
        "memory": "512Mi"
      },
      "min": {
        "cpu": "500m",
        "memory": "256Mi"
      },
      "type": "Container"
    }
  ]
}

With that in place, when I attempt to create a pod with zero memory and cpu, it gets rejected with an appropriate message:

Error from server (Forbidden): error when creating "sample-pod.yaml": pods "test-pod-zero-memory-and-cpu" is forbidden: [minimum cpu usage per Container is 500m, but request is 0, minimum memory usage per Container is 256Mi, but request is 0]

joachimweyl commented 2 weeks ago

While were in this section should we update the default to match the SU ratio?