Closed andrewballantyne closed 1 year ago
cc @jedemo @kywalker-rh @vconzola
we essentially need to ensure that if the user has conveyed that they do NOT want to use GPUs for model serving, their pods actively avoid GPU nodes, so as to not hog them for nothing, and potentially preventing other pods from accessing those GPU nodes.
that behavior is not provided by OpenShift by default: a pod could land on a GPU node even if it did not actively seek it, just by fluke.
maybe the easier thing is to let customer taint GPU nodes. But we would need to make sure that the matching toleration is only given to the pod when we know the user wants a GPU, and taken away when we know they don't want a GPU.
I think this change should happen at the kube scheduler level rather than in each of our individual products. It makes sense that there should be some flag in the scheduler to prefer other nodes over ones with specialized tags
Currently, if you don't specify you want any GPU/HPU/other accelerator, it will just treat those nodes the same, but I can see why that behavior might not be desirable in all cases, I think this should be an RFE to base kube
Just to add a question here:
It seems that specifying 0 GPUs is not the same as not adding them. We need to translate that to UX cause, as other k8s flows, that might mislead the user. We need to add the logic and actively say that it's not the same having 0 gpus than not having them (that could translate to a checkbox or whatever)
Making that change in base Kube will be a long-term thing. We probably want something done in-house in a shorter timeframe, IMO. Also, the reply would likely be that it's what anti-affinities are for.
Whether a user asked for zero GPU or whether they did not even see a box showing them GPU ... does not really make a difference. Logic is simple:
if requested_gpu > N (where N>0) then seek a node with a N available GPUs
else (all other cases), actively avoid nodes that have GPUs
Right, there's a bit more to it than that, but I think it's a suboptimal thing to require each product to have that logic, and it's not just Nvidia GPUs, there are other accelerators that are coming too. We need some global solution rather than an ad-hoc set of solutions in each product
@cfchase do we have a more global solution here? I'm not sure Kube will jump on board with this -- do we want some middle-man controller or something?
For reference, we had this Jira issue when we first enabled GPUs for notebooks: https://issues.redhat.com/browse/RHODS-540
Turns out -- this is already done in code -- we may want to support UI for it.
if (hasGpu) {
tolerations.push({
effect: 'NoSchedule',
key: 'nvidia.com/gpu',
operator: 'Exists',
});
}
This code is used for both Notebooks and Model Servers. It's not configurable and there may be a need to flex the functionality in the future so "GPU Notebooks" and "GPU Model Servers" are not on the same resource stack... but for today this is handled in the current Dashboard code.
A couple of questions for this:
Should we still work on this?
cc @Gkrumbach07
@lucferbux
Maybe part of this is already cover by the accelerator work
accelerators still respect this annotation and will set the accelerator to none.
Ok great, but my question was broader, now that we have a different UI and users actively select an accelerator in the code, do you think everything from this conversation is covered here?
We had two open threads:
My impression is that everything is covered by the work, but I would like to double check @Gkrumbach07
@andrewballantyne just adding you to the conversation in case I'm missing something
@lucferbux
Ability to hide GPUs in runtimes that don't support that feature (already covered)
Yep this is already implemented
Ability to control taints/tolerations/affinity and enablement of accelerators in model servers
accelerator profiles can have an array of tolerations which get assigned in the model serving form. however there is not a UI for setting taints. I am also not familiar with affinity being used. I was under the impression that affinity was not needed and taints was something an admin did through the console and not odh dashboard.
So im guessing we are just missing affinities, but if gpu nodes are set up properly with taints, then im not sure why they are needed. If the case we are trying to avoid is scheduling on a gpu node when not specifically requested
I say we close this -- affinities isn't something we are supporting today on any resource. If we want to rework this later we should do it for all our resources.
/close
I'm closing it with the expectation of no disagreement, if there is one, feel free to reopen and we can discuss the value. This ticket is old and the original goal it set out to do is handled well enough for the needs of accelerators as they are handled today.
@andrewballantyne: Closing this issue.
Feature description
Model Serving is growing to support accelerators -- this could be problematic if the pods in question end up using nodes that support GPUs when we are not intending it. Flip-side, we need to be able to "opt-in" to that when we want to use GPUs.
This likely is affinities/anti-affinities but I don't know a ton about this space, so this is a placeholder for future discussions.
Describe alternatives you've considered
No response
Anything else?
No response