nv-tlabs / NKSR

[CVPR 2023 Highlight] Neural Kernel Surface Reconstruction
https://research.nvidia.com/labs/toronto-ai/NKSR
Other
747 stars 43 forks source link

OOM when doing marching cube in KITTI #17

Closed JIANG-CX closed 1 year ago

JIANG-CX commented 1 year ago

Hello, when I run tests on the KITTI dataset, I'm encountering an OOM (Out of Memory) error during the marching cube process (dual_v, dual_f = MarchingCubes().apply(dmc_graph, dmc_vertices.cuda(), dmc_value.cuda())). I'm wondering if there's a way to perform marching cubes on the dataset in chunks to reduce memory consumption. Could you please advise on how to achieve this?

heiwang1997 commented 1 year ago

Thanks for the advice @JIANG-CX !

We don't have a chunked version of marching cubes now. The main challenge here is to properly stitch chunked sparse grids with different transformations in the dual marching cubes settings. However, there are three ways to circumvent this in practice:

  1. Manually crop your input into chunks, reconstruct each chunk, and stitch the final mesh together using algorithms from, e.g., MeshLab.
  2. (recommended). Why don't you convert dmc_vertices and dmc_value to cpu device, and apply the marching cubes using CPU? We have all the operations implemented both for CUDA and CPU.
  3. Alternatively, you can perform the entire mesh extraction stage on CPU, and this is the most economical way of GPU memory consumption. Please refer to #8 for how this could be achieved.
JIANG-CX commented 1 year ago

Thanks for the advice @JIANG-CX !

We don't have a chunked version of marching cubes now. The main challenge here is to properly stitch chunked sparse grids with different transformations in the dual marching cubes settings. However, there are three ways to circumvent this in practice:

  1. Manually crop your input into chunks, reconstruct each chunk, and stitch the final mesh together using algorithms from, e.g., MeshLab.
  2. (recommended). Why don't you convert dmc_vertices and dmc_value to cpu device, and apply the marching cubes using CPU? We have all the operations implemented both for CUDA and CPU.
  3. Alternatively, you can perform the entire mesh extraction stage on CPU, and this is the most economical way of GPU memory consumption. Please refer to [chunk_size] Make CUDA out of memory. #8 for how this could be achieved.

OK, thanks for your answer.