kubeflow-kale / kale

Kubeflow’s superfood for Data Scientists
http://kubeflow-kale.github.io
Apache License 2.0
632 stars 128 forks source link

Support for NVIDIA MIG GPU instances #437

Closed femtonelson closed 2 years ago

femtonelson commented 2 years ago

Hello, With NVIDIA's Multi-instance GPU feature that allows GPU splicing, new fully qualified names to identify GPU resources have been introduced. In addition to the traditional "nvidia.com/gpu" resource, other GPU resources as shown below are now schedulable. requests.nvidia.com/gpu: "4" requests.nvidia.com/mig-1g.10gb: "4" requests.nvidia.com/mig-2g.20gb: "4" requests.nvidia.com/mig-3g.40gb: "4"

Could you kindly update the Cell Metadata Editor Capture to support these new resources? Possibly an empty field to specify the GPU resource in Kale would be great, as there are many new possible MIG GPU instance names.

Kubeflow notebooks allow customization via jupyter-web-app configmap. Capture2 It would be great to have such customization option in Kale as well.

Thanks for your support. https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html

femtonelson commented 2 years ago

Hello, We forked the project and tweaked the code to achieve this.

Special thanks to Melissa for her contribution.