Closed kaplaton closed 2 months ago
Hi, because we are using torchrun and torchrun can support multi-node, our repository can directly train on multi-node.
Please refer to https://discuss.pytorch.org/t/how-could-we-use-torchrun-to-start-multi-node-training/138039
https://pytorch.org/docs/stable/elastic/run.html
Thank your for your reply!
Hi, because we are using torchrun and torchrun can support multi-node, our repository can directly train on multi-node.
Please refer to https://discuss.pytorch.org/t/how-could-we-use-torchrun-to-start-multi-node-training/138039
https://pytorch.org/docs/stable/elastic/run.html