Closed harryseely closed 1 year ago
DistributedDataParallel
is supported. For example, run the following command to train with 4 GPUs
python classification.py --config configs/cls_m40.yaml SOLVER.gpu 0,1,2,3
I need to use DataParallel because I am working on a windows machine. I actually chose to use this O-CNN implementation because it is the only Sparse-CNN implementation that does not require linux (required for Minkowski, Submanifold Sparse CNN, torchsparse, etc.).
Ok this makes sense, thank you!
I am trying to increase the depth of my octree to greater than 5, but I an running out of memory. As a workaround I would like to use the Pytorch multi GPU DataParallel wrapper. For However, I am running into these two errors when wrapping any OCNN model in nn.DataParallel:
AssertionError: The shape of input data is wrong.
DataParallel splits the data across the GPUs, which is in my case 2. The error is traced to this line of code in octree_conv.py:
check = tuple(data.shape) == self.in_shape
The tuple(data.shape) is
(15646, 3)
whereas self.in_shape is(31291, 3)
.It appears that DataParallel is half working, because data.shape[0] * 2 = 31292 (1 different from self.in_shape). This means that the data is being split across GPUs, but the self.in_shape is not updated to match...
Any idea what might be stopping DataParallel from working in this case? Could you test this?
Thanks!