Closed zpyi closed 2 years ago
Please update the version of CUDA (>= 11.1), you can try the latest 11.3 version.
Please update the version of CUDA (>= 11.1), you can try the latest 11.3 version.
Thanks for your reply! This problem has been solved by updating the version of CUDA to 11.1 and mmcv.
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug
Reproduction
Did you make any modifications on the code or config? Did you understand what you have modified? I haven't change the original code or config.
What dataset did you use and what task did you run? ImageNet VID & DET,video object detection. Environment
Please run
python mmtrack/utils/collect_env.py
to collect necessary environment information and paste it here. sys.platform: linux Python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0] CUDA available: True GPU 0,1,2,3: GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.1.TC455_06.29069683_0 GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 PyTorch: 1.7.1 PyTorch compiling details: PyTorch built with:TorchVision: 0.8.2 OpenCV: 4.5.3 MMCV: 1.3.11 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.0 MMTracking: 0.8.0+bab1abe
You may add addition that may be helpful for locating the problem, such as
How you installed PyTorch [e.g., pip, conda, source]
Other environment variables that may be related (such as
$PATH
,$LD_LIBRARY_PATH
,$PYTHONPATH
, etc.)Error traceback Traceback (most recent call last): File "./tools/train.py", line 175, in
main()
File "./tools/train.py", line 109, in main
init_dist(args.launcher, cfg.dist_params)
File "/home/zh/miniconda3/envs/mmtrack/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 20, in init_dist
_init_dist_pytorch(backend, kwargs)
File "/home/zh/miniconda3/envs/mmtrack/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 34, in _init_dist_pytorch
dist.init_process_group(backend=backend, **kwargs)
File "/home/zh/miniconda3/envs/mmtrack/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
barrier()
File "/home/zh/miniconda3/envs/mmtrack/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
work = _default_pg.barrier()
RuntimeError: Broken pipe
Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!