Open wallarug opened 2 years ago
Maybe you can have a look at what I do in this issue https://github.com/kubeflow/pytorch-operator/issues/354#issue-999999536.
Best wishes.
This repository will be deprecated soon, please open an issue at github.com/kubeflow/training-operator
Hi Team,
I am trying to run a Kubernetes Pod with multiple GPUs in the same pod. I can't seem to find any resources for how to do this. All the resources I find are 1 pod = 1 gpu. I don't want this. I want to be able to spin up 2x4gpu (8gpu) pods or different combinations.
It seems this has been asked before in #219 #331 but no solid answers in there.
The YAML file I have based my testing on is from this tutorial: https://towardsdatascience.com/pytorch-distributed-on-kubernetes-71ed8b50a7ee
I have changed part of it to reflect using 2 GPUs in 1 pod.
I am seeing similar behaviour to #219 where when I spin this up, only 1 GPU gets used by the test code (when I told it to use 2).
Any assistance or pointing in the right direction on this would be great. Thanks!