kubeflow / pytorch-operator

PyTorch on Kubernetes
Apache License 2.0
306 stars 143 forks source link

can I use gpus on specific node to train #310

Closed lwj1980s closed 3 years ago

lwj1980s commented 3 years ago

My k8s Cluster: master: 4 GTX 1080ti node1: 4 RTX 2080 node2: 4 RTX 2080

When I run a pytorchJob, it seems that kubeflow select the gpu automatically. What should I do(write config item on the deploy.yaml or some way else) if I want to use gpus on node1 or node2 specifically,but not master.

issue-label-bot[bot] commented 3 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/question 0.90

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] commented 3 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
area/front-end 0.78

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] commented 3 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
question 0.90

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

Jeffwan commented 3 years ago

You need to use node affinity/label to choose your right GPU.

https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/

lwj1980s commented 3 years ago

You need to use node affinity/label to choose your right GPU.

https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/

Thanks very much!