traveller59 / second.pytorch

SECOND for KITTI/NuScenes object detection
MIT License
1.72k stars 721 forks source link

Memory bottleneck of increase voxel resolution #82

Open Benzlxs opened 5 years ago

Benzlxs commented 5 years ago

Hey Yan,

I am trying to decrease the voxel size from [0.05, 0.05, 0.1] (your original setting) to [0.025, 0.025, 0.0325 ], and the max_number_of_points_per_voxel is also decreased from 5 to 2. The training procedure is smoothing, but there is GPU memory bottleneck. When batch_size=1, the GPU memory consumption is 9851MB, compared to your 7633MB GPU memory when batch_size=3. I feel it does not make sense, but cannot figure out the reason.

Do you have any idea about this issue in GPU usage?

traveller59 commented 5 years ago

the spconv use dense table to determine if a location is active. for example, with 0 ~ 70.4, -40 ~ 40, -3 ~ 1 and voxel size [0.025, 0.025, 0.05], we need 2800 x 3200 x 80 x 4 = 3GB memory per example (but your memory usage is still too high). you can pre-allocate a grid buffer and use:

# __init__
self.grid = torch.full([self.max_batch_size, *sparse_shape], -1, dtype=torch.int32).cuda() # must fill with -1
# forward
x = SparseConvTensor(...)
x.grid = self.grid

for all sparse convolutions with this sparse tensor, they will use this pre-allocated dense table.

Note: the memory usage in config file is wrong. I can use batch_size=8 with car.fhd.config with 11GB memory.

Benzlxs commented 5 years ago

Thank you for your reply. I followed your suggestions and pre-allocated dense table, but memory consumption is similar. My self.sparse_shape is [ 127, 3199, 2815], so 12731992815*4 = 4.365 Gb, this number is correct, when I check the GPU memory consumption with nvidia-smi command line. I tried to check step by step, and found that middle.py takes up around 1.0 Gb memory, and rpn.py comsume 1.586 GB memory. So sum of them are almost to 7 GB, which seems make sense?