Support for NVIDIA MIG GPU instances

kubeflow-kale / kale

Kubeflow’s superfood for Data Scientists

Apache License 2.0

632 stars 128 forks source link

Hello, With NVIDIA's Multi-instance GPU feature that allows GPU splicing, new fully qualified names to identify GPU resources have been introduced. In addition to the traditional "nvidia.com/gpu" resource, other GPU resources as shown below are now schedulable. requests.nvidia.com/gpu: "4" requests.nvidia.com/mig-1g.10gb: "4" requests.nvidia.com/mig-2g.20gb: "4" requests.nvidia.com/mig-3g.40gb: "4"

Could you kindly update the Cell Metadata Editor Capture to support these new resources? Possibly an empty field to specify the GPU resource in Kale would be great, as there are many new possible MIG GPU instance names.

Kubeflow notebooks allow customization via jupyter-web-app configmap. Capture2 It would be great to have such customization option in Kale as well.

Thanks for your support. https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html

Hello, We forked the project and tweaked the code to achieve this.

Added the required MIG GPU references below to labextension files "CellMetadataEditorDialog.tsx" and "InlineMetadata.tsx": nvidia.com/mig-1g.10gb : NVIDIA MIG 10GB nvidia.com/mig-2g.20gb : NVIDIA MIG 20GB nvidia.com/mig-3g.40gb : NVIDIA MIG 40GB
Built and deployed new npm package : https://www.npmjs.com/package/kubeflow-kale-labextension0.7.mig
Edited the regex variable LIMITS_TAG in the python Kale backend nbprocessor.py file found in "/usr/local/lib/python3.6/dist-packages/kale/processors" or "/opt/conda/...../kale/processors", ... to support new MIG GPU references
Finally updated the Dockerfile, rebuilt and published this expleorandd/jupyterlab-kale-cpu:9c80c1a0 docker image to Dockerhub

Special thanks to Melissa for her contribution.

kubeflow-kale / kale

Support for NVIDIA MIG GPU instances #437