traveller59 / spconv

Spatial Sparse Convolution Library
Apache License 2.0
1.89k stars 366 forks source link

Excessive memory usage on high-dimensional input data #46

Closed franciscorubin closed 2 years ago

franciscorubin commented 5 years ago

I'm processing a point cloud with 40k points, but that is divided into a voxelgrid with big dimensions: [5000, 3000, 200].

The problem appears when I try to run it through a SubMConv3d layer with kernel_size of 3 and input/output_channels of 1. It tries to allocate 20 gb of memory in GPU and it crashes.

I traced the error to the method "getIndicePair" of class spconv_ops.h on the following lines:

  torch::Tensor gridOut =
      torch::full({batchSize * outputVolume}, -1,
                  torch::dtype(torch::kInt32).device(indices.device()));

In my case the outputVolume is 5000 3000 200 and causes the crash.

Why do we need to create a matrix that big? This algorithm is supposed to work on sparse matrices, so shouldn't the intermediate matrices created be also sparse? On a Submanifold Sparse CNN the output cells are always the same as the input cells, and the rest stays as 0. Why do we need to create a full grid for it?

traveller59 commented 5 years ago

To get indices pair, we need to check if a location contains a point, so I use a dense table as a lookup table. To solve memory problem, we need a cuda hash table implementation. consider using cpu indice pair generation algorithm in first stage (set coordinates to cpu, still need a large matrix on cpu), then convert coordinates to cuda to use gpu rule generation problem.

In addition, spconv may slower than SparseConvNet in other type of point cloud (I only test my code in LiDAR point cloud), SparseConvNet's cpu indices pair algorithm use hash table so it don't have memory problem. I'll have a try on cudpp hash table.