[chunk_size] Make CUDA out of memory.

qpc001 commented 1 year ago

I 'm using recons_waymo.py, the gpu memory needed is much more while setting [chunk_size=50], lead to a result: CUDA out of memory.

Will the [chunk_size] decrease the Computer Memory or GPU Memory needed?

heiwang1997 commented 1 year ago

Using the chunked mode will not decrease GPU memory usage unless you specify:

reconstructor.chunk_tmp_device = torch.device("cpu")

before reconstruction.

The above code will move the reconstructed chunks into the CPU memory, thus saving the active GPU memory needed.

Hope this helps!!

qpc001 commented 1 year ago

Using the chunked mode will not decrease GPU memory usage unless you specify:
reconstructor.chunk_tmp_device = torch.device("cpu")
before reconstruction.

The above code will move the reconstructed chunks into the CPU memory, thus saving the active GPU memory needed.

Hope this helps!!

This line is in the script

reconstructor.chunk_tmp_device = torch.device("cpu")

But I donw know why enabled [chunk_size] mode will cost more CUDA Memory than disable it.

The problem is I can reconstruct the surf when disabled [chunk_size], but got CUDA out of memory when enable it.

heiwang1997 commented 1 year ago

I see. That's weird. What is your hardware setting? How large is your GPU memory? btw one last resort would be to use device='cpu' throughout the pipeline, but that will be slow.

qpc001 commented 1 year ago

I see. That's weird. What is your hardware setting? How large is your GPU memory? btw one last resort would be to use device='cpu' throughout the pipeline, but that will be slow.

I am using RTX 3060 12GB.

heiwang1997 commented 1 year ago

Here is the code I used -- I tried to simulate 12GB gpu by using another torch program to occupy 13GB of memory on my 3090:

import nksr
import torch

from pycg import vis, exp
from pathlib import Path
import numpy as np
from common import load_waymo_example, warning_on_low_memory

if __name__ == '__main__':
    xyz_np, sensor_np = load_waymo_example()

    device = torch.device("cuda:0")
    reconstructor = nksr.Reconstructor(device)
    reconstructor.chunk_tmp_device = torch.device("cpu")

    input_xyz = torch.from_numpy(xyz_np).float().to(device)
    input_sensor = torch.from_numpy(sensor_np).float().to(device)

    field = reconstructor.reconstruct(
        input_xyz, sensor=input_sensor, detail_level=None,
        # Minor configs for better efficiency (not necessary)
        approx_kernel_grad=True, solver_tol=1e-4, fused_mode=True, 
        # Chunked reconstruction (if OOM)
        chunk_size=50.0,
        preprocess_fn=nksr.get_estimate_normal_preprocess_fn(64, 85.0)
    )
    mesh = field.extract_dual_mesh(mise_iter=1)
    mesh = vis.mesh(mesh.v, mesh.f)

    vis.show_3d([mesh], [vis.pointcloud(xyz_np)])

and the code runs fine w/o OOM:

(nksr) huangjh@ws:~/shared-home/nkf-wild/nksr-train$ python examples/recons_waymo.py 
06-13 19:51:00 (common.py:87) [WARNING] Available GPU memory is 10757.06 MB, we recommend you to have more than 20000.00 MB available. 
nksr.chunk: 100%|█████████████████████████████| 14/14 [00:12<00:00,  1.38s/it]

Maybe your torch version is not standard? Did you use the environment provided by environment.yml?

qpc001 commented 1 year ago

Here is the code I used -- I tried to simulate 12GB gpu by using another torch program to occupy 13GB of memory on my 3090:下面是我使用的代码--我试图通过使用另一个torch程序在我的3090上占用13 GB的内存来模拟12 GB的gpu：

import nksr
import torch

from pycg import vis, exp
from pathlib import Path
import numpy as np
from common import load_waymo_example, warning_on_low_memory

if __name__ == '__main__':
    xyz_np, sensor_np = load_waymo_example()

    device = torch.device("cuda:0")
    reconstructor = nksr.Reconstructor(device)
    reconstructor.chunk_tmp_device = torch.device("cpu")

    input_xyz = torch.from_numpy(xyz_np).float().to(device)
    input_sensor = torch.from_numpy(sensor_np).float().to(device)

    field = reconstructor.reconstruct(
        input_xyz, sensor=input_sensor, detail_level=None,
        # Minor configs for better efficiency (not necessary)
        approx_kernel_grad=True, solver_tol=1e-4, fused_mode=True, 
        # Chunked reconstruction (if OOM)
        chunk_size=50.0,
        preprocess_fn=nksr.get_estimate_normal_preprocess_fn(64, 85.0)
    )
    mesh = field.extract_dual_mesh(mise_iter=1)
    mesh = vis.mesh(mesh.v, mesh.f)

    vis.show_3d([mesh], [vis.pointcloud(xyz_np)])

and the code runs fine w/o OOM:代码运行良好，没有OOM：

(nksr) huangjh@ws:~/shared-home/nkf-wild/nksr-train$ python examples/recons_waymo.py 
06-13 19:51:00 (common.py:87) [WARNING] Available GPU memory is 10757.06 MB, we recommend you to have more than 20000.00 MB available. 
nksr.chunk: 100%|█████████████████████████████| 14/14 [00:12<00:00,  1.38s/it]

Maybe your torch version is not standard? Did you use the environment provided by environment.yml?

I am not using waymo dataset. hhh

By the way, will the [chunk_size] mode enabled to reconstruct a super large scale scene for a tiny CUDA Memory device like RTX 3060?

heiwang1997 commented 1 year ago

Yes this is definitely doable, all you need to do is to extract the mesh on CPU. I will provide you with more information after we update the wheels. (There is one bug that I just fixed)

heiwang1997 commented 1 year ago

@qpc001 Hi please install the newest nksr package:

pip install -U nksr -f https://nksr.huangjh.tech/whl/torch-2.0.0+cu118.html

, and refer to the recipe here: https://github.com/nv-tlabs/NKSR/blob/public/NKSR-USAGE.md#running-on-a-device-with-small-memory to reconstruct with small memory.

Thank you!

qpc001 commented 1 year ago

@qpc001 Hi please install the newest nksr package:
pip install -U nksr -f https://nksr.huangjh.tech/whl/torch-2.0.0+cu118.html
, and refer to the recipe here: https://github.com/nv-tlabs/NKSR/blob/public/NKSR-USAGE.md#running-on-a-device-with-small-memory to reconstruct with small memory.

Thank you!

It works. Nice.

nv-tlabs / NKSR

[chunk_size] Make CUDA out of memory. #8