nyu-systems / Grendel-GS

Ongoing research training gaussian splatting at scale by distributed system
Apache License 2.0
380 stars 20 forks source link

How to train it using 16 or more gpus which are in different nodes? #19

Closed kaplaton closed 2 months ago

TarzanZhao commented 3 months ago

Hi, because we are using torchrun and torchrun can support multi-node, our repository can directly train on multi-node.

Please refer to https://discuss.pytorch.org/t/how-could-we-use-torchrun-to-start-multi-node-training/138039

https://pytorch.org/docs/stable/elastic/run.html

kaplaton commented 3 months ago

Thank your for your reply!