redhat-ai-services / ai-accelerator

The AI Accelerator is a template project for setting up Red Hat OpenShift AI using GitOps
28 stars 59 forks source link

Check Cluster CPU/Memory Resources in Bootstrap #47

Open strangiato opened 1 month ago

strangiato commented 1 month ago

When deploying a cluster with the bootstrap script it is not uncommon to not have enough resources in the nodes to successfully deploy everything.

It would be great if the bootstrap script did a check of the available CPU and Memory and made sure that there was more than a set minimum of both of those resources. If it doesn't it may give you a warning and provide you the option to continue (allowing you to correct the issue by spinning up additional nodes if you wish to).

strangiato commented 1 month ago

An initial estimate is about 20 CPUs and 50 GiB required.

carlmes commented 1 month ago

Great idea, while installing on a default AWS OpenShift cluster I get the following issues in ArgoCD:

image
carlmes commented 1 month ago

By default, it looks like no dedicated worker nodes are scaled up on the demo.redhat.com instance, meaning that they hyper-converged worker/control plane is running out of availability during provisioning

image