Closed AlexIlis closed 6 months ago
Can you suggest how to implement multi gpu - multi node training with torchpack ?
I have set -H ip1:gpus,ip2:gpus and launched the train from both the nodes, however they don't seem to be getting a handle of one another. What am I missing here ?
-H ip1:gpus,ip2:gpus
Could you try to SSH into ip1 and ip2? You need to make sure that these two machines can be SSH-ed into without password.
ip1
ip2
Can you suggest how to implement multi gpu - multi node training with torchpack ?
I have set
-H ip1:gpus,ip2:gpus
and launched the train from both the nodes, however they don't seem to be getting a handle of one another. What am I missing here ?