hqucv / siamban

Siamese Box Adaptive Network for Visual Tracking
Apache License 2.0
280 stars 52 forks source link

Training the problem #51

Open wyq2020022029 opened 3 years ago

wyq2020022029 commented 3 years ago

我把config.yaml的数据集除了coco其他的注释掉,使用coco训练。在MobaXterm的容器中执行命令报出

File "/work/siamban-master/tools/siamban/utils/distributed.py", line 104, in dist_init rank, world_size = _dist_init() File "/work/siamban-master/tools/siamban/utils/distributed.py", line 86, in _dist_init dist.init_process_group(backend='nccl') File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 400, in init_process_group store, rank, world_size = next(rendezvous(url)) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 143, in _env_rendezvous_handler store = TCPStore(master_addr, master_port, world_size, start_daemon) RuntimeError: Address already in use

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "../../tools/train.py", line 310, in main() File "../../tools/train.py", line 249, in main rank, world_size = dist_init() File "/work/siamban-master/tools/siamban/utils/distributed.py", line 111, in dist_init raise RuntimeError(*e.args) RuntimeError: Address already in use Traceback (most recent call last): File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 253, in main() File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 249, in main cmd=cmd) subprocess.CalledProcessError: Command '['/opt/conda/envs/pytorch-py3.6/bin/python', '-u', '../../tools/train.py', '--local_rank=2', '--cfg', 'config.yaml']' returned non-zero exit status 1.

这个这样的错误。使用的是单机多卡。 在使用单卡训练的时候,输入Python3 -u ../../tools/train.py --cfg config.yaml报错: Traceback (most recent call last): File "../../tools/train.py", line 310, in main() File "../../tools/train.py", line 249, in main rank, world_size = dist_init() File "/work/siamban-master/tools/siamban/utils/distributed.py", line 104, in dist_init rank, world_size = _dist_init() File "/work/siamban-master/tools/siamban/utils/distributed.py", line 83, in _dist_init rank = int(os.environ['RANK']) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/os.py", line 669, in getitem raise KeyError(key) from None KeyError: 'RANK 请问您这个是怎样的问题呢? 该怎么处理呢?

zeduchen commented 3 years ago

For single node, single GPU or multiple GPUs training, please refer to here.