kubeflow / pytorch-operator

PyTorch on Kubernetes
Apache License 2.0
306 stars 143 forks source link

GCP preemptible instances #237

Open Nintorac opened 4 years ago

Nintorac commented 4 years ago

Just wondering how this operator handles being run on preemptible GCP instances and where I can find more documentation on the subject

Thanks

johnugeorge commented 4 years ago

Can you explain more on your requirements?

NikeNano commented 4 years ago

Will Pytorch jobs be scheduled to Preemptible GPU:s by default if they are present?

dmitsf commented 2 years ago

The same question. How will PJ pods behave on instances which can stop/resume/reshedule workloads?

gaocegege commented 2 years ago

The same question. How will PJ pods behave on instances which can stop/resume/reshedule workloads?

It depends on the training code and restartPolicy you defined in the PyTorchJob yaml. We do not take it as a special case, I think.