opea-project / GenAIComps

GenAI components at micro-service level; GenAI service composer to create mega-service
Apache License 2.0
24 stars 62 forks source link

Run time validation tool to ascertain user provided pipeline resource needs can be met in the deployment infrastructure #207

Closed mkbhanda closed 1 week ago

mkbhanda commented 1 month ago

For example, if the deployment infrastructure is a Kubernetes cluster and the user has requested the use of GPUs or special purpose accelerators that do not exist, promptly return failure message. Occasionally there may be inadequate resources to meet a request, and either the cluster must grow or the request fail to deploy.

feng-intel commented 3 weeks ago

Good suggestion. I will report this feature.

kevinintel commented 2 weeks ago

@mkbhanda, please suggest correct assignees

Thanks

mkbhanda commented 1 week ago

Scheduling failures are well summarized in https://www.howtouselinux.com/post/kubernetes-pod-pending I was thinking of something quick, particularly when a user brings say a Gaudi specific image and wants to launch it on a cluster without Gaudi nodes, generalizing this to specialized hardware. Something like an admission control webhook. That said, how quickly does docker or Kubernetes fail when resources are inappropriate or indequate? If fast, perhaps not worth constructing the feature. Let me get Iris to comment to help de/prioritize this enhancement request.

The broader problem of when to expand / contract a cluster involves which type of node to add/drop based on the existing workload needs, node utilization, and workload activity patterns. https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling. Does cluster-api monitor Gaudi useage metrics == my guess is not, or perhaps not adequately. Must CSPs support Cluster-API. Intel Developer Cloud is considering supporting it.

Also what should the policy be? Fail to schedule a workload or try to expand the cluster with the right type and number of nodes to meet a workload requirement?

And let me ask Sasha on this aspect for scale testing? Growing/shrinking a cluster may be future work.

kad commented 1 week ago

As already mentioned in the documentation of k8s, if workload specifies any extended resource that is not present in the cluster, pod will be in pending state with explanation that resource is not available. Deploying any kind of admission hoooks is possible, but it is not "standard" setup, thus require additional effort from cluster administrators. It is possible, but benefit of getting error message vs. "pending state" is not worth the effort. Cluster autoscalers (publicly available) are quite badly support extended resources or accelerators, often they have hardcoded logic realted to specific CSP (to fetch metadata about available accelerators from CSP speficific tags) or specific to accelerator (e.g. special code for Nvidia, special for Habana was seen in Karpenter)... In my opinion, it would be better to have "cluster autoscaler on per vendor" for accelerator devices, but that is probably not that SIG-Scheduling/autoscalers community would agree on, so we might later on propose specific to Gaudi patches to Karpenter or k8s-sigs/ cluster autoscaler.

mkbhanda commented 1 week ago

Based on all input, closing this issue.