nyu-systems / Grendel-GS

Ongoing research training gaussian splatting at scale by distributed system
Apache License 2.0
311 stars 17 forks source link

Network error #9

Closed Vilour closed 1 month ago

Vilour commented 1 month ago

Hi,

I tried to run with python -m torch.distributed.run --standalone --nnodes=1 --nproc-per-node=4 train.py --bsz 4 -s But return the following errors:

[W socket.cpp:601] [c10d] The IPv6 network addresses of (user, 53821) cannot be retrieved (gai error: -3 - Temporary failure in name resolution).

What's that meaning?

Vilour commented 1 month ago

This is an error of torchrun. solved by add 127.0.0.1 user to /etc/hosts.