stormshift / support

This repo should serve as a central source for reporting issues with stormshift
GNU General Public License v3.0
3 stars 0 forks source link

Setup infranodes & add more resources to ocp4 #52

Closed rbo closed 2 years ago

rbo commented 2 years ago

Idea:

Infra node documentation: https://access.redhat.com/solutions/5034771

github-actions[bot] commented 2 years ago

Heads up @cluster/ocp3-admin - the "cluster/ocp3" label was applied to this issue.

DanielFroehlich commented 2 years ago

why are we doing this? Current worker nodes have 16 cores / 128G RAM - thats plenty of ressources. We have to watch overall HW utilitsation,with other clusters coming, we are starting to hit limits.

rbo commented 2 years ago

Nope its not:

$ oc describe no -l node-role.kubernetes.io/worker= |grep -A 7 "Allocated resources:"
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                12582m (81%)   18050m (116%)
  memory             48224Mi (37%)  74792Mi (58%)
  ephemeral-storage  0 (0%)         0 (0%)
  hugepages-2Mi      0 (0%)         0 (0%)
--
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                15193m (98%)   19700m (127%)
  memory             64547Mi (50%)  77484Mi (60%)
  ephemeral-storage  0 (0%)         0 (0%)
  hugepages-2Mi      0 (0%)         0 (0%)
--
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                3594m (23%)  5 (32%)
  memory             9816Mi (7%)  9440Mi (7%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
--
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                3065m (87%)   10800m (308%)
  memory             7861Mi (52%)  17252Mi (115%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)

We have not much workload on it and it's hard to update or change something because the cluster is busy with itself. (OCS, Logging,...)

Might be useful to join some clusters to have more resources available and fewer resources for the control plane. For example: join ocp5 & ocp4 because AI/ML & VM workload is more fun with OCS. Just an idea, we have to discuss in detail on a next stormshift call or via gchat.

github-actions[bot] commented 2 years ago

Heads up @cluster/ocp4-admin - the "cluster/ocp4" label was applied to this issue.

DanielFroehlich commented 2 years ago

master are not schedulable, additional gpu worker node is up and running, I am closing this issue for now.