Bug for PointPillar with multi-GPU

nywenjing commented 4 years ago

Hi All, It seems there is a bug in the PointPillar when training on multi-GPU. I think it is the batch_idx in voxel coordinates. Say there are 4 samples in a batch [0,1,2,3] and two GPU. When split to the two GPUs, the first has [0,1] and the second has [2,3]. It is okay for the first gpu, but for the second gpu, the PointPillarsScatter only count from 0. So for the data on the second GPU, it is equivalent to an empty point cloud. One temporary remedy I use now is to add a line:

coors[:,0] -= coors[:,0].min() after line 370 in voxelnet.py

nywenjing commented 4 years ago

my mistake, should be line 357

https://github.com/traveller59/second.pytorch/blob/3aba19c9688274f75ebb5e576f65cfe54773c021/second/pytorch/models/voxelnet.py#L357

yinjunbo commented 4 years ago

Have found the same problem. Thank you for the solution.

traveller59 / second.pytorch

Bug for PointPillar with multi-GPU #291