open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.01k stars 9.36k forks source link

how to set multi gpu environment like dist_train.sh? #10532

Open hoya-cho opened 1 year ago

hoya-cho commented 1 year ago

How can I set multi gpu environment in my program like multi gpu using torch.distributed.launch in dist_train.sh?

Even if the master port and master address are set in os.environ, local_rank is not repeated because the gpu is not caught.

Torch.cuda.device_count() catches 2 gpu, but os.environ environment catches 1.


Runtime environment: cudnn_benchmark: True mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: None Distributed launcher: pytorch Distributed training: True GPU number: 1

narchitect commented 4 months ago

hi @hoya-cho , I also have same problem with this. did you solve this problem?