lxxue / FRNN

Fixed Radius Nearest Neighbor Search on GPU
181 stars 24 forks source link

CUDA OOM(potential enhancement) #4

Closed ShengyuH closed 3 years ago

ShengyuH commented 3 years ago

hi,

Thanks for providing this awesome toolbox. I encounter OOM error when using it. Here is the minimal example: data: https://drive.google.com/file/d/1X_8xmTvGwzv8FxXif2VaqtT_oO07_WwT/view?usp=sharing Code snippet:

import frnn
import torch

if __name__=='__main__':
    device = torch.device('cuda')
    points = torch.load('dump/pts.pth')[None,:,:].to(device).float()
    n_points = torch.tensor([points.size(1)]).to(device).long()
    K=10
    radius = 0.05
    print(points.size(), n_points)
    _, idxs, _,_ = frnn.frnn_grid_points(points, points, n_points, n_points, K, radius, grid=None, return_nn=False, return_sorted=False)

Errror message:

$ python minimal_sample.py 
torch.Size([1, 156424, 3]) tensor([156424], device='cuda:0')
Traceback (most recent call last):
  File "minimal_sample.py", line 11, in <module>
    _, idxs, _,_ = frnn.frnn_grid_points(points, points, n_points, n_points, K, radius, grid=None, return_nn=False, return_sorted=False)
  File "/scratch2/shengyu/spv/lib/python3.8/site-packages/frnn-0.0.0-py3.8-linux-x86_64.egg/frnn/frnn.py", line 331, in frnn_grid_points
    idxs, dists, sorted_points2, pc2_grid_off, sorted_points2_idxs, grid_params_cuda = _frnn_grid_points.apply(
  File "/scratch2/shengyu/spv/lib/python3.8/site-packages/frnn-0.0.0-py3.8-linux-x86_64.egg/frnn/frnn.py", line 137, in forward
    pc1_grid_cnt = torch.zeros((N, G),
RuntimeError: CUDA out of memory. Tried to allocate 7.51 GiB (GPU 0; 23.70 GiB total capacity; 15.03 GiB already allocated; 7.00 GiB free; 15.04 GiB reserved in total by PyTorch)

Is this because you are using dense grids? I think with sparse hash, I can still handle such point clouds. I'd really appreciate if you can provide some hints.

Best, Shengyu

lxxue commented 3 years ago

Hi Shengyu,

Thanks for the example to reproduce the error. The problem is that when the scale of the bounding box of the point cloud is much larger than the scale of the search radius, I will use a default grid resolution (see variable grid_max_res in python and GRID_3D_MAX_RES / GRID_2D_MAX_RES in c) to avoid a huge grid. The default value was set to 128 and it is too large for most GPUs. Now I set it to 64 and it works on my GPU with 8GB memory.

For the sparse grid suggestion, I feel like I will have to change the code completely and not sure if the overhead would be marginal. Will check about this repo later. Thanks for the reference!

Best, Lixin

yuhao commented 3 years ago

I still have OOM issues. I am using K=50, D=3, radius=2. I intentionally set radius_cell_ratio to a very small number (0.001) so that the cell size is big. The boundary of the point cloud is scene boundary: (-80.000000, -3.000000, -80.000000), (80.000000, 25.000000, 80.000000). The error happens at:

File "lib/python3.6/site-packages/frnn-0.0.0-py3.6-linux-x86_64.egg/frnn/frnn.py", line 368, in frnn_grid_points
    return_sorted, radius_cell_ratio)
File "lib/python3.6/site-packages/frnn-0.0.0-py3.6-linux-x86_64.egg/frnn/frnn.py", line 210, in forward
    K, r, r * r)

What's weird is that no matter what value I set radius_cell_ratio to, seems like it's always trying to allocate the same amount of device memory.

lxxue commented 3 years ago

Could you give me a minimal example and data to reproduce the error?

yuhao commented 3 years ago

This point cloud file (https://drive.google.com/file/d/1IJL0va_l2QTLB4qb_HCvgpbPhp14OKjX/view?usp=sharing) has 6,222,091 points.

The search code is: frnn.frnn_grid_points(pc, pc, lengths1=None, lengths2=None, K=100, r=2, radius_cell_ratio=2)

The error is: RuntimeError: CUDA out of memory. Tried to allocate 2.32 GiB (GPU 1; 7.80 GiB total capacity; 5.00 GiB already allocated; 1.87 GiB free; 5.01 GiB reserved in total by PyTorch)

I did the calculation. The return results requires 6222091 100 4 ~ 2.32 GB. So seems like other data structures have taken too much device memory so that we couldn't allocate memory for the return buffer.

lxxue commented 3 years ago

Since we stored the sorted version of both point clouds, we actually have 3 point clouds of 2.32GB (input pc, sorted pc1, sorted pc2), which takes up 7GB memory. So it could be expected to have OOM error for this pc. I will add support for pc1 and pc2 being the same point cloud later. In that case, we only need to have two pcs of 2.32GB.

Thanks for the example. I found another two small bugs from it.