Closed zlcnju closed 5 years ago
@johnugeorge
@zlcnju Thanks for finding this issue.
Now, the pytorch operator will not create worker pod until master pod is running. However, when use kube-batch, it will schedule until all pods created. So now, when use gang schedule, the master will always in the pending state.
The quick fix is exclude the master pod in the batch. This way, the master is first created, and the podgroup and worker pod will be created after the master pod is running. In this solution, there is possibility that some master pod consume all resources.
Another solution is add init container in the worker pod. In this solution, all pods(master and worker) will created together. But in worker pod, it's init container will check master pod ready . This solution is better, but it will bring in an init container.
Instead of worker pods waiting to get created before the master pod is running, you propose to have worker pods created together with master pods but wait till it is running(via init containers to check master pod readiness). Did I get it right?
Any other idea? I will work on this if discussion is done.
Solution 2 looks ok to me. I would wait for @gaocegege and @richardsliu 's comments.
one other quick fix would be skip the workers waiting to get created if gang scheduling is enabled. (https://github.com/kubeflow/pytorch-operator/blob/master/pkg/controller.v1/pytorch/pod.go#L60)
Thanks for your issue
the pytorch operator will not create worker pod until master pod is running.
Is it a limitation of pytorch?
Thanks for your issue
the pytorch operator will not create worker pod until master pod is running.
Is it a limitation of pytorch?
Yes. Pytorch workers will crash if master is not up. Reference: https://github.com/kubeflow/pytorch-operator/issues/125
one other quick fix would be skip the workers waiting to get created if gang scheduling is enabled. (https://github.com/kubeflow/pytorch-operator/blob/master/pkg/controller.v1/pytorch/pod.go#L60)
if skip the workers waiting to get created if gang scheduling is enabled, the order in gang schedule will not be guaranteed The quick fix i means is that exclude the master in the batch and not add kube-batch annotation for master https://github.com/kubeflow/pytorch-operator/blob/6aa39a411eb9fe5e62b54b0a3cf09225abe7d551/pkg/controller.v1/pytorch/controller.go#L440
Instead of worker pods waiting to get created before the master pod is running, you propose to have worker pods created together with master pods but wait till it is running(via init containers to check master pod readiness). Did I get it right?
@johnugeorge you got me
The quick fix i means is that exclude the master in the batch and not add kube-batch annotation for master
Then how can we get the benefit from gang scheduling?
Solution 2 looks ok to me.
one other quick fix would be skip the workers waiting to get created if gang scheduling is enabled. (https://github.com/kubeflow/pytorch-operator/blob/master/pkg/controller.v1/pytorch/pod.go#L60)
if skip the workers waiting to get created if gang scheduling is enabled, the order in gang schedule will not be guaranteed
What do you mean by "Order will not guaranteed"? I meant to say that if gang scheduling is enabled, don't do explicit check of master being running before the worker pods. ie, start all pods immediately(like the initial implementation).
The quick fix i means is that exclude the master in the batch and not add kube-batch annotation for master
I think, this will be a hack. Solution 2 is a better solution.
The quick fix i means is that exclude the master in the batch and not add kube-batch annotation for master
Then how can we get the benefit from gang scheduling?
The benefit is only for the worker pod this way
What do you mean by "Order will not guaranteed"? I meant to say that if gang scheduling is enabled, don't do explicit check of master being running before the worker pods. ie, start all pods immediately(like the initial implementation).
No matter the gang scheduling is the enabled, The worker pod should be created after the master pod is running.
If no objections, i will work on solution 2
Sure.
@zlcnju i have same problem ,but i don‘’t understand your Solution 2 .Can you describe that what init containor dose? How do I write my yaml ?
@zlcnju i have same problem ,but i don‘’t understand your Solution 2 .Can you describe that what init containor dose? How do I write my yaml ?
in solution 2, all pods start up the same time. And the order in master and worker will be ensured by init container. In the pr, the yaml is the same. But if you want your custom init container, just modify the configmap pytorch-operator
Now, the pytorch operator will not create worker pod until master pod is running. However, when use kube-batch, it will schedule until all pods created. So now, when use gang schedule, the master will always in the pending state.
The quick fix is exclude the master pod in the batch. This way, the master is first created, and the podgroup and worker pod will be created after the master pod is running. In this solution, there is possibility that some master pod consume all resources.
Another solution is add init container in the worker pod. In this solution, all pods(master and worker) will created together. But in worker pod, it's init container will check master pod ready . This solution is better, but it will bring in an init container.
Any other idea? I will work on this if discussion is done.