yucornetto / MGMatting

This repository includes the official project of Mask Guided (MG) Matting, presented in our paper: Mask Guided Matting via Progressive Refinement Network
Other
332 stars 48 forks source link

distribution training #9

Closed EDGSCOUT closed 3 years ago

EDGSCOUT commented 3 years ago

Traceback (most recent call last): File "main.py", line 107, in main() File "main.py", line 27, in main torch.distributed.init_process_group(backend='nccl', init_method='env://') File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 423, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 179, in _env_rendezvous_handler store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout) RuntimeError: Address already in use Torch Version: 1.7.0 Torch Version: 1.7.0 Traceback (most recent call last): File "main.py", line 107, in main() File "main.py", line 27, in main torch.distributed.init_process_group(backend='nccl', init_method='env://') File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 423, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 179, in _env_rendezvous_handler store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout) KeyboardInterrupt Traceback (most recent call last): File "/home/ps/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/ps/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/ps/anaconda3/bin/python', '-u', 'main.py', '--local_rank=3', '--config=config/MGMatting-DIM.toml']' returned non-zero exit status 1. (base) ps@ps:~/Downloads/MGMatting-main/code-base$ Traceback (most recent call last): File "main.py", line 107, in main() File "main.py", line 27, in main torch.distributed.init_process_group(backend='nccl', init_method='env://') File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 442, in init_process_group barrier() File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1947, in barrier work = _default_pg.barrier() RuntimeError: Broken pipe Traceback (most recent call last): File "main.py", line 107, in main() File "main.py", line 27, in main torch.distributed.init_process_group(backend='nccl', init_method='env://') File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 442, in init_process_group barrier() File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1947, in barrier work = _default_pg.barrier() RuntimeError: Broken pipe Traceback (most recent call last): File "main.py", line 107, in main() File "main.py", line 27, in main torch.distributed.init_process_group(backend='nccl', init_method='env://') File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 442, in init_process_group barrier() File "/home/ps/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1947, in barrier work = _default_pg.barrier() RuntimeError: Broken pipe

yucornetto commented 3 years ago

Hi, could you check your pytorch and cuda version? It seems to me that maybe the problem is caused by some library issues.

EDGSCOUT commented 3 years ago

Hi, could you check your pytorch and cuda version? It seems to me that maybe the problem is caused by some library issues.

yes,I have solved it. it's a version problem. we can close.