kubeflow / pytorch-operator

PyTorch on Kubernetes
Apache License 2.0
307 stars 143 forks source link

whether multi-gpu-per-pod setup be supported in PytorchJob #331

Open tingweiwu opened 3 years ago

tingweiwu commented 3 years ago

If there is 2GPU per node, how to set the Worker spec In the PytorchJob 1 replicas with 2GPU per pod or 2 replicas with only 1GPU per pod?

I've seen similar issues: #219 , but there is no clear instrunctions on whether multi-gpu-per-pod setup be supported in PytorchJob ?

is pytorch-operator designed for 1-gpu-per-pod setup even through there is multi-gpu on the same node?

will multi-gpu-per-pod setup be supported ?

wallarug commented 3 years ago

Hey @tingweiwu ,

Did you ever get this sorted? I am struggling with the same issue.