skyhehe123 / SA-SSD

SA-SSD: Structure Aware Single-stage 3D Object Detection from Point Cloud (CVPR 2020)
492 stars 106 forks source link

multi-GPU trainning error #91

Open vehxianfish opened 3 years ago

vehxianfish commented 3 years ago

I use multi-GPU trainning,but errors occurs:

Traceback (most recent call last):
  File "./train.py", line 131, in <module>
    main()
  File "./train.py", line 82, in main
    model = MMDistributedDataParallel(model.cuda(),find_unused_parameters=True)
  File "/home/ubuntu-502/xu/CIA-SSD/envs/CIA-SSD/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 305, in __init__
    self._ddp_init_helper()
  File "/home/ubuntu-502/xu/CIA-SSD/envs/CIA-SSD/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 323, in _ddp_init_helper
    self._module_copies = replicate(self.module, self.device_ids, detach=True)
  File "/home/ubuntu-502/xu/CIA-SSD/envs/CIA-SSD/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 88, in replicate
    param_copies = _broadcast_coalesced_reshape(params, devices, detach)
  File "/home/ubuntu-502/xu/CIA-SSD/envs/CIA-SSD/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 67, in _broadcast_coalesced_reshape
    return comm.broadcast_coalesced(tensors, devices)
  File "/home/ubuntu-502/xu/CIA-SSD/envs/CIA-SSD/lib/python3.6/site-packages/torch/cuda/comm.py", line 39, in broadcast_coalesced
    return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]

Any one meet this question or can help me to check this errors? Thank you very much~~~