[Feature Request]: Model Serving - Taints/Tolerations/Affinities/Etc

andrewballantyne commented 1 year ago

Feature description

Model Serving is growing to support accelerators -- this could be problematic if the pods in question end up using nodes that support GPUs when we are not intending it. Flip-side, we need to be able to "opt-in" to that when we want to use GPUs.

This likely is affinities/anti-affinities but I don't know a ton about this space, so this is a placeholder for future discussions.

Describe alternatives you've considered

No response

Anything else?

No response

andrewballantyne commented 1 year ago

cc @jedemo @kywalker-rh @vconzola

erwangranger commented 1 year ago

we essentially need to ensure that if the user has conveyed that they do NOT want to use GPUs for model serving, their pods actively avoid GPU nodes, so as to not hog them for nothing, and potentially preventing other pods from accessing those GPU nodes.

that behavior is not provided by OpenShift by default: a pod could land on a GPU node even if it did not actively seek it, just by fluke.

maybe the easier thing is to let customer taint GPU nodes. But we would need to make sure that the matching toleration is only given to the pod when we know the user wants a GPU, and taken away when we know they don't want a GPU.

Xaenalt commented 1 year ago

I think this change should happen at the kube scheduler level rather than in each of our individual products. It makes sense that there should be some flag in the scheduler to prefer other nodes over ones with specialized tags

Currently, if you don't specify you want any GPU/HPU/other accelerator, it will just treat those nodes the same, but I can see why that behavior might not be desirable in all cases, I think this should be an RFE to base kube

lucferbux commented 1 year ago

Just to add a question here:

It seems that specifying 0 GPUs is not the same as not adding them. We need to translate that to UX cause, as other k8s flows, that might mislead the user. We need to add the logic and actively say that it's not the same having 0 gpus than not having them (that could translate to a checkbox or whatever)

erwangranger commented 1 year ago

Making that change in base Kube will be a long-term thing. We probably want something done in-house in a shorter timeframe, IMO. Also, the reply would likely be that it's what anti-affinities are for.

Whether a user asked for zero GPU or whether they did not even see a box showing them GPU ... does not really make a difference. Logic is simple:

if requested_gpu > N   (where N>0) then seek a node with a N available GPUs
else (all other cases), actively avoid nodes that have GPUs

Xaenalt commented 1 year ago

Right, there's a bit more to it than that, but I think it's a suboptimal thing to require each product to have that logic, and it's not just Nvidia GPUs, there are other accelerators that are coming too. We need some global solution rather than an ad-hoc set of solutions in each product

andrewballantyne commented 1 year ago

@cfchase do we have a more global solution here? I'm not sure Kube will jump on board with this -- do we want some middle-man controller or something?

jedemo commented 1 year ago

For reference, we had this Jira issue when we first enabled GPUs for notebooks: https://issues.redhat.com/browse/RHODS-540

andrewballantyne commented 1 year ago

Turns out -- this is already done in code -- we may want to support UI for it.

  if (hasGpu) {
    tolerations.push({
      effect: 'NoSchedule',
      key: 'nvidia.com/gpu',
      operator: 'Exists',
    });
  }

This code is used for both Notebooks and Model Servers. It's not configurable and there may be a need to flex the functionality in the future so "GPU Notebooks" and "GPU Model Servers" are not on the same resource stack... but for today this is handled in the current Dashboard code.

lucferbux commented 1 year ago

A couple of questions for this:

We already created a way to intentionally hide GPU/accelerator resources in Custom Serving Runtimes with a special annotation: https://github.com/opendatahub-io/odh-dashboard/issues/1258
Maybe part of this is already cover by the accelerator work

Should we still work on this?

cc @Gkrumbach07

Gkrumbach07 commented 1 year ago

@lucferbux

Maybe part of this is already cover by the accelerator work

accelerators still respect this annotation and will set the accelerator to none.

lucferbux commented 1 year ago

Ok great, but my question was broader, now that we have a different UI and users actively select an accelerator in the code, do you think everything from this conversation is covered here?

We had two open threads:

Ability to hide GPUs in runtimes that don't support that feature (already covered)
Ability to control taints/tolerations/affinity and enablement of accelerators in model servers

My impression is that everything is covered by the work, but I would like to double check @Gkrumbach07

@andrewballantyne just adding you to the conversation in case I'm missing something

Gkrumbach07 commented 1 year ago

@lucferbux

Ability to hide GPUs in runtimes that don't support that feature (already covered)

Yep this is already implemented

Ability to control taints/tolerations/affinity and enablement of accelerators in model servers

accelerator profiles can have an array of tolerations which get assigned in the model serving form. however there is not a UI for setting taints. I am also not familiar with affinity being used. I was under the impression that affinity was not needed and taints was something an admin did through the console and not odh dashboard.

So im guessing we are just missing affinities, but if gpu nodes are set up properly with taints, then im not sure why they are needed. If the case we are trying to avoid is scheduling on a gpu node when not specifically requested

andrewballantyne commented 1 year ago

I say we close this -- affinities isn't something we are supporting today on any resource. If we want to rework this later we should do it for all our resources.

/close

I'm closing it with the expectation of no disagreement, if there is one, feel free to reopen and we can discuss the value. This ticket is old and the original goal it set out to do is handled well enough for the needs of accelerators as they are handled today.

openshift-ci[bot] commented 1 year ago

@andrewballantyne: Closing this issue.

In response to [this](https://github.com/opendatahub-io/odh-dashboard/issues/1261#issuecomment-1756599659): >I say we close this -- affinities isn't something we are supporting today on any resource. If we want to rework this later we should do it for all our resources. > >/close > >I'm closing it with the expectation of no disagreement, if there is one, feel free to reopen and we can discuss the value. This ticket is old and the original goal it set out to do is handled well enough for the needs of accelerators as they are handled today. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

opendatahub-io / odh-dashboard