kubeflow / pytorch-operator

PyTorch on Kubernetes
Apache License 2.0
306 stars 143 forks source link

cleanPodPolicy Set to Running should clean Running pod #260

Open xrmzju opened 4 years ago

xrmzju commented 4 years ago

https://github.com/kubeflow/pytorch-operator/blob/047cf0f41e68e030158f532017a226c18827a660/pkg/controller.v1/pytorch/job.go#L160 we just ignore running policy for now

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
bug 0.57

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

xrmzju commented 4 years ago

@gaocegege

gaocegege commented 4 years ago

When the PyTorchJob is failed, all replicas should be failed. Then there is no difference between none and running. Then we ignore it. Do you have problem with it?

/cc @johnugeorge

xrmzju commented 4 years ago

When the PyTorchJob is failed, all replicas should be failed. Then there is no difference between none and running. Then we ignore it. Do you have problem with it?

/cc @johnugeorge

but in my condition it seems not like this...

pytorch-test-master-0                                               1/1     Running                0          4m15s
pytorch-test-worker-0                                               0/1     Error                  0          4m15s
pytorch-test-worker-1                                               0/1     Error                  0          4m15s
gaocegege commented 4 years ago

@johnugeorge WDYT