microsoft / Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
https://arxiv.org/abs/2103.14030
MIT License
13.99k stars 2.06k forks source link

Error!! #17

Closed yutinyang closed 2 years ago

yutinyang commented 3 years ago

ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

zeliu98 commented 3 years ago

Hi, could you share me with the command and environment? Note that we use torch.distributed.launch to run jobs. An example is

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345  main.py \
--cfg configs/swin_tiny_patch4_window7_224.yaml --data-path <imagenet-path> --batch-size 128 

You can find more information in get_started.

yutinyang commented 3 years ago

Thanks. I have changed my configuration to solve this problem.

daixiaolei623 commented 3 years ago

@yutinyang Hi when i run"python tools/test.py /home/dai/code/semantic_segmentation/25/Swin-Transformer-Semantic-Segmentation-main/configs/swin/upernet_swin_base_patch4_window7_512x512_160k_ade20k.py /home/dai/code/semantic_segmentation/25/Swin-Transformer-Semantic-Segmentation-main/upernet_swin_base_patch4_window7_512x512.pth --eval mIoU" in semantic segmentation, i got the following error: "AssertionError: fused_bias_leakyrelu miss in module _ext" Could you please tell me how to solve it. thank you.

daixiaolei623 commented 3 years ago

@yutinyang I run in ubuntu, cuda10.1, anaconda3

yutinyang commented 3 years ago

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py \ --cfg configs/swin_tiny_patch4_window7_224.yaml --data-path --batch-size 128

I used "python -m torch.distributed.launch --nproc_per_node 4 --master_port 12345 main.py \ --cfg configs/swin_tiny_patch4_window7_224.yaml --data-path --batch-size 128" ,It doesn't work.

Error in line 310; ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

guolonghui commented 3 years ago

@yutinyang Hi,the problem is solved? I also encountered this problem now,like the below _### (swin) G:\win10_tensorflow\Swin-Transformer-main>python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval --cfg configs/swin_base_patch4_window7_224.yaml --resume swin_base_patch4_window7_224.pth --data-path G:\image\imagenet => merge config from configs/swin_base_patch4_window7_224.yaml RANK and WORLD_SIZE in environ: 0/1 Traceback (most recent call last): File "main.py", line 312, in torch.distributed.init_process_group(backend='nccl', init_method='env://', world_size=world_size, rank=rank) File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group init_method, rank, world_size, timeout=timeout File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous raise RuntimeError("No rendezvous handler for {}://".format(result.scheme)) RuntimeError: No rendezvous handler for env:// Traceback (most recent call last): File "G:\ProgramData\Anaconda3\envs\swin\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "G:\ProgramData\Anaconda3\envs\swin\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\launch.py", line 260, in main() File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['G:\ProgramData\Anaconda3\envs\swin\python.exe', '-u', 'main.py', '--local_rank=0', '--eval', '--cfg', 'configs/swin_base_patch4_window7_224.yaml', '--resume', 'swin_base_patch4_window7224.pth', '--data-path', 'G:\image\imagenet']' returned non-zero exit status 1.

liyiersan commented 3 years ago

@yutinyang Hi,the problem is solved? I also encountered this problem now,like the below _### (swin) G:\win10_tensorflow\Swin-Transformer-main>python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval --cfg configs/swin_base_patch4_window7_224.yaml --resume swin_base_patch4_window7_224.pth --data-path G:\image\imagenet => merge config from configs/swin_base_patch4_window7_224.yaml RANK and WORLD_SIZE in environ: 0/1 Traceback (most recent call last): File "main.py", line 312, in torch.distributed.init_process_group(backend='nccl', init_method='env://', world_size=world_size, rank=rank) File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group init_method, rank, world_size, timeout=timeout File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous raise RuntimeError("No rendezvous handler for {}://".format(result.scheme)) RuntimeError: No rendezvous handler for env:// Traceback (most recent call last): File "G:\ProgramData\Anaconda3\envs\swin\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "G:\ProgramData\Anaconda3\envs\swin\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\launch.py", line 260, in main() File "G:\ProgramData\Anaconda3\envs\swin\lib\site-packages\torch\distributed\launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['G:\ProgramData\Anaconda3\envs\swin\python.exe', '-u', 'main.py', '--local_rank=0', '--eval', '--cfg', 'configs/swin_base_patch4_window7_224.yaml', '--resume', 'swin_base_patch4_window7224.pth', '--data-path', 'G:\image\imagenet']' returned non-zero exit status 1.

This works for me. https://blog.csdn.net/qq_36622589/article/details/117913064

guolonghui commented 3 years ago

把GCC版本调低,我的是改成6.5就能用了

ymmm-4 commented 1 year ago

Lower the GCC version, mine is changed to 6.5 and it will work.

how?