Closed yutinyang closed 2 years ago
Hi, could you share me with the command and environment? Note that we use torch.distributed.launch
to run jobs. An example is
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py \
--cfg configs/swin_tiny_patch4_window7_224.yaml --data-path <imagenet-path> --batch-size 128
You can find more information in get_started.
Thanks. I have changed my configuration to solve this problem.
@yutinyang Hi when i run"python tools/test.py /home/dai/code/semantic_segmentation/25/Swin-Transformer-Semantic-Segmentation-main/configs/swin/upernet_swin_base_patch4_window7_512x512_160k_ade20k.py /home/dai/code/semantic_segmentation/25/Swin-Transformer-Semantic-Segmentation-main/upernet_swin_base_patch4_window7_512x512.pth --eval mIoU" in semantic segmentation, i got the following error: "AssertionError: fused_bias_leakyrelu miss in module _ext" Could you please tell me how to solve it. thank you.
@yutinyang I run in ubuntu, cuda10.1, anaconda3
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py \ --cfg configs/swin_tiny_patch4_window7_224.yaml --data-path
--batch-size 128
I used "python -m torch.distributed.launch --nproc_per_node 4 --master_port 12345 main.py \
--cfg configs/swin_tiny_patch4_window7_224.yaml --data-path
Error in line 310; ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set
@yutinyang Hi,the problem is solved? I also encountered this problem now,like the below
_### (swin) G:\win10_tensorflow\Swin-Transformer-main>python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval --cfg configs/swin_base_patch4_window7_224.yaml --resume swin_base_patch4_window7_224.pth --data-path G:\image\imagenet
=> merge config from configs/swin_base_patch4_window7_224.yaml
RANK and WORLD_SIZE in environ: 0/1
Traceback (most recent call last):
File "main.py", line 312, in
@yutinyang Hi,the problem is solved? I also encountered this problem now,like the below _### (swin) G:\win10_tensorflow\Swin-Transformer-main>python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval --cfg configs/swin_base_patch4_window7_224.yaml --resume swin_base_patch4_window7_224.pth --data-path G:\image\imagenet => merge config from configs/swin_base_patch4_window7_224.yaml RANK and WORLD_SIZE in environ: 0/1 Traceback (most recent call last): File "main.py", line 312, in torch.distributed.init_process_group(backend='nccl', init_method='env://', world_size=world_size, rank=rank) File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group init_method, rank, world_size, timeout=timeout File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous raise RuntimeError("No rendezvous handler for {}://".format(result.scheme)) RuntimeError: No rendezvous handler for env:// Traceback (most recent call last): File "G:\ProgramData\Anaconda3\envs\swin\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "G:\ProgramData\Anaconda3\envs\swin\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\launch.py", line 260, in main() File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['G:\ProgramData\Anaconda3\envs\swin\python.exe', '-u', 'main.py', '--local_rank=0', '--eval', '--cfg', 'configs/swin_base_patch4_window7_224.yaml', '--resume', 'swin_base_patch4_window7224.pth', '--data-path', 'G:\image\imagenet']' returned non-zero exit status 1.
This works for me. https://blog.csdn.net/qq_36622589/article/details/117913064
把GCC版本调低,我的是改成6.5就能用了
Lower the GCC version, mine is changed to 6.5 and it will work.
how?
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set