File "/work/siamban-master/tools/siamban/utils/distributed.py", line 104, in dist_init
rank, world_size = _dist_init()
File "/work/siamban-master/tools/siamban/utils/distributed.py", line 86, in _dist_init
dist.init_process_group(backend='nccl')
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 400, in init_process_group
store, rank, world_size = next(rendezvous(url))
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 143, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon)
RuntimeError: Address already in use
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "../../tools/train.py", line 310, in
main()
File "../../tools/train.py", line 249, in main
rank, world_size = dist_init()
File "/work/siamban-master/tools/siamban/utils/distributed.py", line 111, in dist_init
raise RuntimeError(*e.args)
RuntimeError: Address already in use
Traceback (most recent call last):
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/envs/pytorch-py3.6/bin/python', '-u', '../../tools/train.py', '--local_rank=2', '--cfg', 'config.yaml']' returned non-zero exit status 1.
这个这样的错误。使用的是单机多卡。
在使用单卡训练的时候,输入Python3 -u ../../tools/train.py --cfg config.yaml报错:
Traceback (most recent call last):
File "../../tools/train.py", line 310, in
main()
File "../../tools/train.py", line 249, in main
rank, world_size = dist_init()
File "/work/siamban-master/tools/siamban/utils/distributed.py", line 104, in dist_init
rank, world_size = _dist_init()
File "/work/siamban-master/tools/siamban/utils/distributed.py", line 83, in _dist_init
rank = int(os.environ['RANK'])
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/os.py", line 669, in getitem
raise KeyError(key) from None
KeyError: 'RANK
请问您这个是怎样的问题呢? 该怎么处理呢?
我把config.yaml的数据集除了coco其他的注释掉,使用coco训练。在MobaXterm的容器中执行命令报出
File "/work/siamban-master/tools/siamban/utils/distributed.py", line 104, in dist_init rank, world_size = _dist_init() File "/work/siamban-master/tools/siamban/utils/distributed.py", line 86, in _dist_init dist.init_process_group(backend='nccl') File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 400, in init_process_group store, rank, world_size = next(rendezvous(url)) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 143, in _env_rendezvous_handler store = TCPStore(master_addr, master_port, world_size, start_daemon) RuntimeError: Address already in use
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "../../tools/train.py", line 310, in
main()
File "../../tools/train.py", line 249, in main
rank, world_size = dist_init()
File "/work/siamban-master/tools/siamban/utils/distributed.py", line 111, in dist_init
raise RuntimeError(*e.args)
RuntimeError: Address already in use
Traceback (most recent call last):
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/envs/pytorch-py3.6/bin/python', '-u', '../../tools/train.py', '--local_rank=2', '--cfg', 'config.yaml']' returned non-zero exit status 1.
这个这样的错误。使用的是单机多卡。 在使用单卡训练的时候,输入Python3 -u ../../tools/train.py --cfg config.yaml报错: Traceback (most recent call last): File "../../tools/train.py", line 310, in
main()
File "../../tools/train.py", line 249, in main
rank, world_size = dist_init()
File "/work/siamban-master/tools/siamban/utils/distributed.py", line 104, in dist_init
rank, world_size = _dist_init()
File "/work/siamban-master/tools/siamban/utils/distributed.py", line 83, in _dist_init
rank = int(os.environ['RANK'])
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/os.py", line 669, in getitem
raise KeyError(key) from None
KeyError: 'RANK
请问您这个是怎样的问题呢? 该怎么处理呢?