Closed qpc001 closed 1 year ago
Using the chunked mode will not decrease GPU memory usage unless you specify:
reconstructor.chunk_tmp_device = torch.device("cpu")
before reconstruction.
The above code will move the reconstructed chunks into the CPU memory, thus saving the active GPU memory needed.
Hope this helps!!
Using the chunked mode will not decrease GPU memory usage unless you specify:
reconstructor.chunk_tmp_device = torch.device("cpu")
before reconstruction.
The above code will move the reconstructed chunks into the CPU memory, thus saving the active GPU memory needed.
Hope this helps!!
This line is in the script
reconstructor.chunk_tmp_device = torch.device("cpu")
But I donw know why enabled [chunk_size] mode will cost more CUDA Memory than disable it.
The problem is I can reconstruct the surf when disabled [chunk_size], but got CUDA out of memory when enable it.
I see. That's weird. What is your hardware setting? How large is your GPU memory?
btw one last resort would be to use device='cpu'
throughout the pipeline, but that will be slow.
I see. That's weird. What is your hardware setting? How large is your GPU memory? btw one last resort would be to use
device='cpu'
throughout the pipeline, but that will be slow.
I am using RTX 3060 12GB.
Here is the code I used -- I tried to simulate 12GB gpu by using another torch program to occupy 13GB of memory on my 3090:
import nksr
import torch
from pycg import vis, exp
from pathlib import Path
import numpy as np
from common import load_waymo_example, warning_on_low_memory
if __name__ == '__main__':
xyz_np, sensor_np = load_waymo_example()
device = torch.device("cuda:0")
reconstructor = nksr.Reconstructor(device)
reconstructor.chunk_tmp_device = torch.device("cpu")
input_xyz = torch.from_numpy(xyz_np).float().to(device)
input_sensor = torch.from_numpy(sensor_np).float().to(device)
field = reconstructor.reconstruct(
input_xyz, sensor=input_sensor, detail_level=None,
# Minor configs for better efficiency (not necessary)
approx_kernel_grad=True, solver_tol=1e-4, fused_mode=True,
# Chunked reconstruction (if OOM)
chunk_size=50.0,
preprocess_fn=nksr.get_estimate_normal_preprocess_fn(64, 85.0)
)
mesh = field.extract_dual_mesh(mise_iter=1)
mesh = vis.mesh(mesh.v, mesh.f)
vis.show_3d([mesh], [vis.pointcloud(xyz_np)])
and the code runs fine w/o OOM:
(nksr) huangjh@ws:~/shared-home/nkf-wild/nksr-train$ python examples/recons_waymo.py
06-13 19:51:00 (common.py:87) [WARNING] Available GPU memory is 10757.06 MB, we recommend you to have more than 20000.00 MB available.
nksr.chunk: 100%|█████████████████████████████| 14/14 [00:12<00:00, 1.38s/it]
Maybe your torch version is not standard? Did you use the environment provided by environment.yml
?
Here is the code I used -- I tried to simulate 12GB gpu by using another torch program to occupy 13GB of memory on my 3090:下面是我使用的代码--我试图通过使用另一个torch程序在我的3090上占用13 GB的内存来模拟12 GB的gpu:
import nksr import torch from pycg import vis, exp from pathlib import Path import numpy as np from common import load_waymo_example, warning_on_low_memory if __name__ == '__main__': xyz_np, sensor_np = load_waymo_example() device = torch.device("cuda:0") reconstructor = nksr.Reconstructor(device) reconstructor.chunk_tmp_device = torch.device("cpu") input_xyz = torch.from_numpy(xyz_np).float().to(device) input_sensor = torch.from_numpy(sensor_np).float().to(device) field = reconstructor.reconstruct( input_xyz, sensor=input_sensor, detail_level=None, # Minor configs for better efficiency (not necessary) approx_kernel_grad=True, solver_tol=1e-4, fused_mode=True, # Chunked reconstruction (if OOM) chunk_size=50.0, preprocess_fn=nksr.get_estimate_normal_preprocess_fn(64, 85.0) ) mesh = field.extract_dual_mesh(mise_iter=1) mesh = vis.mesh(mesh.v, mesh.f) vis.show_3d([mesh], [vis.pointcloud(xyz_np)])
and the code runs fine w/o OOM:代码运行良好,没有OOM:
(nksr) huangjh@ws:~/shared-home/nkf-wild/nksr-train$ python examples/recons_waymo.py 06-13 19:51:00 (common.py:87) [WARNING] Available GPU memory is 10757.06 MB, we recommend you to have more than 20000.00 MB available. nksr.chunk: 100%|█████████████████████████████| 14/14 [00:12<00:00, 1.38s/it]
Maybe your torch version is not standard? Did you use the environment provided by
environment.yml
?
I am not using waymo dataset. hhh
By the way, will the [chunk_size] mode enabled to reconstruct a super large scale scene for a tiny CUDA Memory device like RTX 3060?
Yes this is definitely doable, all you need to do is to extract the mesh on CPU. I will provide you with more information after we update the wheels. (There is one bug that I just fixed)
@qpc001 Hi please install the newest nksr
package:
pip install -U nksr -f https://nksr.huangjh.tech/whl/torch-2.0.0+cu118.html
, and refer to the recipe here: https://github.com/nv-tlabs/NKSR/blob/public/NKSR-USAGE.md#running-on-a-device-with-small-memory to reconstruct with small memory.
Thank you!
@qpc001 Hi please install the newest
nksr
package:pip install -U nksr -f https://nksr.huangjh.tech/whl/torch-2.0.0+cu118.html
, and refer to the recipe here: https://github.com/nv-tlabs/NKSR/blob/public/NKSR-USAGE.md#running-on-a-device-with-small-memory to reconstruct with small memory.
Thank you!
It works. Nice.
I 'm using
recons_waymo.py
, the gpu memory needed is much more while setting [chunk_size=50], lead to a result: CUDA out of memory.Will the [chunk_size] decrease the Computer Memory or GPU Memory needed?