Closed houz42 closed 4 years ago
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
kind/question | 0.69 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
kind/feature | 0.57 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
@gaocegege you metioned in another issue (https://github.com/kubeflow/pytorch-operator/issues/278#issuecomment-642353290) that:
you can have 1 Master to run local training jobs
But I just can not create a pytorchjob without worker.
Then you can try to create one worker job.
Then you can try to create one worker job.
A pytorchjob with no master? can't create it neither, master replica must be 1
I tried to create one master job and it works. Can you explain more about But I just can not create a pytorchjob without worker.
Is there any error during the run?
Is there any error during the run?
I finally realized it was my fault. I dit not create "a pytorchjob with only 1 master", but "a pytorchjob with 1 master and 0 worker", which was denied during validation.
Sorry for my mistake and thanks for your patient.
as defined in the crd, worker replicas must >= 1, and master replica == 1, so how to create such a training job runs on single node?
https://github.com/kubeflow/pytorch-operator/blob/eba73411bc03d70b72dcab623aa7a01c14f811d4/manifests/crd.yaml#L37