kubeflow / mpi-operator

Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
https://www.kubeflow.org/docs/components/training/mpi/
Apache License 2.0
417 stars 209 forks source link

Fix: no overwrite when run launcher as worker #628

Closed kuizhiqing closed 4 months ago

kuizhiqing commented 4 months ago

Default implementation overwrite envs NVIDIA_VISIBLE_DEVICES and NVIDIA_DRIVER_CAPABILITIES which is not work in the RunLauncherAsWorker case with GPU.

google-oss-prow[bot] commented 4 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/mpi-operator/blob/master/OWNERS)~~ [alculquicondor] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment