Closed Xaenalt closed 2 years ago
I don't think we should be adding this as part of the default functionality for ODH JupyterHub. This would be best as an optional feature that users can enable and configure as part of the jupyterhub-singleuserprofiles in the same way that they can apply node tolerations for gpu notebooks
That's fair, this was an attempt to solve the issue listed in the JIRA. There's also the way we've recommended which was node taints/tolerations, this does try to kind of thread the needle between both options
It's worth noting, if you're requesting GPUs, you still get them, this will just have non-GPU workloads prefer non-GPU nodes
Made 2 changes: Now only adds the affinity if the user doesn't specify any GPUs in requests Merges the affinity dict with any that might already exist
@Xaenalt The ability to apply nodeAffinity
to notebook pods is already a supported feature of jupyterhub-singlesuerprofiles (JSP)
The docs for enabling and configuring it are located here - https://github.com/opendatahub-io/jupyterhub-singleuser-profiles/blob/master/docs/configuration.md
Opened https://github.com/opendatahub-io/odh-manifests/pull/556 as an alternate solution
The main difference between these approaches is this one allows for adding the affinity conditionally only if the user doesn't request a GPU. That might be preferable to minimize confusion to the user if one is debugging something, though the approaches should be identical in functionality
I don't think we should be automatically applying this in the jupyterhub_config for all non-gpu configs. I still believe this is better supported in JSP to allow a user to enable it by choice for non_gpu pods with a custom JSP configmap. We already have support for user configuration of affinity
and this conditional cpu only support should be added in JSP to support non_gpu affinity
Causes notebooks to prefer non-GPU nodes unless a GPU is explicitly requested or no non-GPU nodes are available
Related Issues and Dependencies
This is a solution proposed in https://issues.redhat.com/browse/RHODS-3074
This introduces a breaking change
This Pull Request implements
This allows us to have pods always prefer non-GPU nodes unless a GPU is explicitly requested. As mentioned in the JIRA, this probably is what users expect. This shouldn't break anything, tested on my cluster and it worked. If all CPU nodes are full (or unable to schedule onto them, etc), it will go onto a GPU node.
There may be a more elegant way to do this, but that's the gist of the change
Description
Using that preferredDuringSchedulingIgnoredDuringExecution affinity, a notebook will always prefer non-GPU nodes, however adding a GPU in resource requests will force it onto a GPU node