lucasjinreal / yolov7_d2

šŸ”„šŸ”„šŸ”„šŸ”„ (Earlier YOLOv7 not official one) YOLO with Transformers and Instance Segmentation, with TensorRT acceleration! šŸ”„šŸ”„šŸ”„
GNU General Public License v3.0
3.12k stars 480 forks source link

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. #14

Open concerttttt opened 2 years ago

concerttttt commented 2 years ago

Hi jinfagang,

Thanks for your amazing contribution.

Would you mind help with this issue below, I'm not quite familiar with detectron.

While I tried to run exp on coco2017 dataset with train_detr.py code and detr_256_6_6_regnetx_0.4g.yaml confile file, a error occurred in the init process.

ERROR [03/08 14:10:37 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step
    self._trainer.run_step()
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 273, in run_step
    loss_dict = self.model(data)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/code/yolov7/yolov7/modeling/meta_arch/detr.py", line 165, in forward
    output = self.detr(images)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/code/yolov7/yolov7/modeling/meta_arch/detr.py", line 449, in forward
    features, pos = self.backbone(samples)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/code/yolov7/yolov7/modeling/backbone/detr_backbone.py", line 504, in forward
    xs = self[0](tensor_list)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/code/yolov7/yolov7/modeling/meta_arch/detr.py", line 356, in forward
    features = self.backbone(images.tensor)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/modeling/backbone/regnet.py", line 315, in forward
    x = self.stem(x)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/modeling/backbone/regnet.py", line 87, in forward
    x = layer(x)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 732, in forward
    world_size = torch.distributed.get_world_size(process_group)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size
    return _get_group_size(group)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size
    default_pg = _get_default_group()
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 411, in _get_default_group
    "Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Can you show some hint how to fix this issue?

Thanks, Yuxin.

lucasjinreal commented 2 years ago

Make sure you using right command line to start training.