Open crml233 opened 8 months ago
Hi,I just met the same question. I modified these three places. Just run the benchmark.py and it works on windows system now. Hope it can help you.
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12345'
os.environ['RANK'] = '0'
os.environ['WORLD_SIZE'] = '1'
os.environ["PL_TORCH_DISTRIBUTED_BACKEND"] = 'gloo'
init_dist(args.launcher, 'gloo')
Hi,I just met the same question. I modified these three places. Just run the benchmark.py and it works on windows system now. Hope it can help you.
os.environ['MASTER_ADDR'] = 'localhost' os.environ['MASTER_PORT'] = '12345' os.environ['RANK'] = '0' os.environ['WORLD_SIZE'] = '1' os.environ["PL_TORCH_DISTRIBUTED_BACKEND"] = 'gloo'
init_dist(args.launcher, 'gloo')
Thank you very much! It works!!
Prerequisite
Task
I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.
Branch
master branch https://github.com/open-mmlab/mmrotate
Environment
sys.platform: linux Python: 3.8.16 (default, Jun 12 2023, 18:09:05) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0,1,2,3: NVIDIA TITAN X (Pascal) CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.24 GCC: gcc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010 PyTorch: 1.12.1 PyTorch compiling details: PyTorch built with:
TorchVision: 0.13.1 OpenCV: 4.8.0 MMEngine: 0.7.4 MMRotate: 1.0.0rc1+4aae1fc
Reproduces the problem - code sample
When the --task is set as default 'dataloader', the following command works.
CUDA_VISIBLE_DEVICES=3 python -m torch.distributed.launch --nproc_per_node=1 --master_port=29500 tools/analysis_tools/benchmark.py /home/czj/mmrotate/cfg_ship/SUBSRS/sub1/fcos_sub1_100e.py --checkpoint /home/czj/mmrotate/work_dirs/fcos_sub1_100e/epoch_100.pth --launcher pytorch
eg: ............. 03/19 10:20:35 - mmengine - INFO - ============== Done ================== 03/19 10:20:35 - mmengine - INFO - Overall fps: 120.2 batch/s, times per batch: 8.3 ms/batch, batch size: 1, num_workers: 2 03/19 10:20:35 - mmengine - INFO - (GB) mem_used: 9.38 | uss: 0.11 | pss: 0.36 | total_proc: 3 .........
But when I change 'dataloader' to 'inference', an error occurs:
Reproduces the problem - command or script
CUDA_VISIBLE_DEVICES=3 python -m torch.distributed.launch --nproc_per_node=1 --master_port=29500 tools/analysis_tools/benchmark.py /home/czj/mmrotate/cfg_ship/SUBSRS/sub1/fcos_sub1_100e.py --checkpoint /home/czj/mmrotate/work_dirs/fcos_sub1_100e/epoch_100.pth --launcher pytorch
with '--task' set to 'inference' in tools/analysis_tools/benchmark.py
or
CUDA_VISIBLE_DEVICES=3 python -m torch.distributed.launch --nproc_per_node=1 --master_port=29500 tools/analysis_tools/benchmark.py /home/czj/mmrotate/cfg_ship/SUBSRS/sub1/fcos_sub1_100e.py --checkpoint /home/czj/mmrotate/work_dirs/fcos_sub1_100e/epoch_100.pth --task inference --launcher pytorch
Reproduces the problem - error message
Additional information
No response