Closed rbo closed 2 years ago
Heads up @cluster/ocp3-admin - the "cluster/ocp3" label was applied to this issue.
why are we doing this? Current worker nodes have 16 cores / 128G RAM - thats plenty of ressources. We have to watch overall HW utilitsation,with other clusters coming, we are starting to hit limits.
Nope its not:
$ oc describe no -l node-role.kubernetes.io/worker= |grep -A 7 "Allocated resources:"
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 12582m (81%) 18050m (116%)
memory 48224Mi (37%) 74792Mi (58%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
--
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 15193m (98%) 19700m (127%)
memory 64547Mi (50%) 77484Mi (60%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
--
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 3594m (23%) 5 (32%)
memory 9816Mi (7%) 9440Mi (7%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
--
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 3065m (87%) 10800m (308%)
memory 7861Mi (52%) 17252Mi (115%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
We have not much workload on it and it's hard to update or change something because the cluster is busy with itself. (OCS, Logging,...)
Might be useful to join some clusters to have more resources available and fewer resources for the control plane. For example: join ocp5 & ocp4 because AI/ML & VM workload is more fun with OCS. Just an idea, we have to discuss in detail on a next stormshift call or via gchat.
Heads up @cluster/ocp4-admin - the "cluster/ocp4" label was applied to this issue.
master are not schedulable, additional gpu worker node is up and running, I am closing this issue for now.
Idea:
appNN.ocp4...
nodes each with an GPU. (Remove nodegpu
, and move gpu fromcompute-0
)Infra node documentation: https://access.redhat.com/solutions/5034771