Open vv-p opened 3 years ago
I also faced this problem, but I managed to resolve it. In my case, it was just using torchelastic
instead of torch
for starting the jobs as torchelastic
is now deprecated (https://pytorch.org/blog/pytorch-1.9-released/#beta-torchelastic-is-now-part-of-core)
So exactly, I changed:
python3 -m torchelastic.distributed.launch
to (documentation)
python3 -m torch.distributed.run
And then, everything just started to work again.
Hi,
I have the following error when I try to run my code with torchelastic:
Steps to reproduce:
I've tried several different versions of torch and torchelastic (latest stable included) but nothing happened, error is still here. Can you help me please, what does this error mean ? How I can fix it ?
os centos 7 python python3.8.3 torch 1.9.0 torchelastic 0.2.2 python-etcd 0.4.5