Open airpaio opened 1 year ago
I came across #271 which seems related. The first comment there mentions using templates. Still trying to work through this, but maybe these docs will help https://docs.posit.co/job-launcher/kube.html#kube-templating
Woops - apologies for the delay missing this.
Is the context here more for Workbench or more for Connect?
I have used this functionality with Workbench - when default-nvidia-gpus
, max-nvidia-gpus
, and friends are enabled inside of Workbench, we display a selector that allows permitted users to decide whether their session uses GPUs / how many (presuming any nodes have more than 1 GPU available). It ultimately finds its way into the job as resources
:
https://docs.posit.co/job-launcher/kube.html#kube-profiles
In my testing, this was sufficient to get a GPU job scheduled properly. If this is not the case, we would love to learn more about what is going wrong! I definitely understood the tolerations
shown in your example to be "overkill" in some sense (i.e. in a cluster with many different types of GPUs, make sure it runs on this one). Using templating or job-json-overrides
today (which you reference) is unfortunately not a fantastic answer as it would require all jobs for a given user to use those tolerations.
Connect has not exposed any of this functionality to date. It is possible to run all Connect jobs with GPUs (i.e. using the default
), but not to select which jobs run with GPUs / not. If this is coming up as an important piece of functionality, we would love to learn more details about why so we can help prioritize the work necessary!
Does anyone have any experience in running Launcher sessions with GPUs? I know we can set Launcher profiles
server.profiles.launcher.kubernetes.profiles.conf
withdefault-nvidia-gpus
, but what else is required? In a different project we would configure GPU jobs with tolerations similar to what is seen here https://github.com/NVIDIA/k8s-device-plugin#running-gpu-jobs. Just wondering how this might translate into the LAuncher profiles config for Workbench/Connect?