kubeflow / pytorch-operator

PyTorch on Kubernetes
Apache License 2.0
306 stars 143 forks source link

Does pytorch-opterator just simplified the use of nn.parallel.DistributedDataParallel on multi nodes of multi gpu? #311

Closed lwj1980s closed 3 years ago

lwj1980s commented 3 years ago

despite of the characters of k8s, could I considered that pytorch-opterator is kind of one-key deploy of pytorch DistributedDataParallel training?

gaocegege commented 3 years ago

Yeah, I think you can. But we do not just deploy, we maintain the lifecycle of the training jobs.

lwj1980s commented 3 years ago

Yeah, I think you can. But we do not just deploy, we maintain the lifecycle of the training jobs.

Thank you very much,I have got it