zhiyuanyou / UniAD

[NeurIPS 2022 Spotlight] A Unified Model for Multi-class Anomaly Detection
Apache License 2.0
250 stars 28 forks source link

torch.distributed.launch is deprecated #7

Closed wyanb closed 2 years ago

wyanb commented 2 years ago

home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions FutureWarning,

It was able to run through before, but it suddenly encountered this problem today. Please kindly take a look,thank you

zhiyuanyou commented 2 years ago

Hi, what is the version of pytorch you used ?

wyanb commented 2 years ago

It is 1.8

zhiyuanyou commented 2 years ago

Well, we suggest to use pytorch 1.5. But I think what you put here it is only a Warning, it should not affect the running. Could you put more message?

wyanb commented 2 years ago

/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

FutureWarning, Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launch.py", line 193, in main() File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/run.py", line 713, in run )(cmd_args) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 252, in launch_agent result = agent.run() File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(args, kwargs) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/elastic/agent/server/api.py", line 837, in _invoke_run self._initialize_workers(self._worker_group) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, *kwargs) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/elastic/agent/server/api.py", line 678, in _initialize_workers self._rendezvous(worker_group) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(args, kwargs) File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/elastic/agent/server/api.py", line 538, in _rendezvous store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous() File "/home/ubuntu/anaconda3/envs/bisenet/lib/python3.6/site-packages/torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py", line 61, in next_rendezvous multi_tenant=True, RuntimeError: Address already in use

The servers are 3080 and 3090, and no one was using them at that time. I don't know why they still failed to run.

zhiyuanyou commented 2 years ago

In your response: The servers are 3080 and 3090, and no one was using them at that time. I don't know why they still failed to run. Can I understand that you used 2 GPUs and gave 2 different ports for these 2 GPUs? If it is true, you should give 2 GPUs the same port.

wyanb commented 2 years ago

Thank you for your answer. Does that mean I need to reassemble the hardware?

wyanb commented 2 years ago

I think I know how to fix this. Thank you