OOM issue - Githubissues

Lwy-1998 commented 7 months ago

My platform is ubuntu22.04, RTX3090 with 24GB VRAM When I trained on Nerf-DS datasets at as_novel_view scene on default settting, I always run out of memory. This also happends on GUI training. It seems like the after certain iteration,the number of 3d gaussian start to increase, and causing OOM error. Is there any solution for this?

ingra14m commented 7 months ago

Can you show me the command you ran? I've never encountered an oom issue on the NeRF-DS dataset.

Lwy-1998 commented 6 months ago

Thanks for reply. Sorry, I tried to reproduce this problem on Nerf-DS datset, but it never show again. So never mind :) But it did happen on HyperNerf Dataset.It seems like the deformation field joint training stage trying to allocate incorrect memory space.

Here is the command I used for training with GUI on HyperNeRF dataset . python train_gui.py -s data/HyperNeRF/aleks-teapot -m output/aleks-teapot --eval --is_blender After 3k iterations, it shows CUDA OOM error.

RuntimeError: CUDA out of memory. Tried to allocate 15.11 GiB (GPU 0; 23.69 GiB total capacity; 5.81 GiB already allocated; 14.26 GiB free; 7.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
-teapot [07/12 03:24:44]
Tensorboard not available: not logging progress [07/12 03:24:44]
Found dataset.json file, assuming Nerfies data set! [07/12 03:24:45]
Reading Nerfies Info [07/12 03:24:45]
 [07/12 03:24:47]
Loading Training Cameras [07/12 03:24:47]
Loading Test Cameras [07/12 03:24:48]
Number of points at initialisation :  11835 [07/12 03:24:48]
Training progress:   8%|██                         | 3000/40000 [00:46<18:32, 33.26it/s, Loss=0.1931091]Traceback (most recent call last):
  File "train_gui.py", line 776, in <module>
    gui.render()
  File "train_gui.py", line 513, in render
    self.train_step()
  File "train_gui.py", line 568, in train_step
    render_pkg_re = render(viewpoint_cam, self.gaussians, self.pipe, self.background, d_xyz, d_rotation, d_scaling, self.dataset.is_6dof)
  File "/home/lwy/3dGaussian/Deformable-3D-Gaussians/gaussian_renderer/__init__.py", line 115, in render
    cov3D_precomp=cov3D_precomp)
  File "/home/lwy/.conda/envs/gaussian_splatting/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lwy/.conda/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/__init__.py", line 219, in forward
    raster_settings, 
  File "/home/lwy/.conda/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/__init__.py", line 41, in rasterize_gaussians
    raster_settings,
  File "/home/lwy/.conda/envs/gaussian_splatting/lib/python3.7/site-packages/diff_gaussian_rasterization/__init__.py", line 92, in forward
    num_rendered, color, depth, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: CUDA out of memory. Tried to allocate 15.11 GiB (GPU 0; 23.69 GiB total capacity; 5.81 GiB already allocated; 14.26 GiB free; 7.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ingra14m commented 6 months ago

Hi, the situation with HyperNeRF is normal.

I found in my experiments that the camera poses of HyperNeRF are not accurate in many scenes, which can lead to an exponential increase in the number of point clouds. You can refer to the supplementary materials of the latest version of the paper.

Therefore, HyperNeRF is only used as a reference, and the dataset I used in the main text is the NeRF-DS dataset, which has more accurate camera poses. You can refer to the demo on the project page to filter out the usable HyperNeRF scenes.

Lwy-1998 commented 6 months ago

Hi,@ingra14m Sorry, I haven't noticed that. it already shows in README file. Thanks for your answer :).

ingra14m / Deformable-3D-Gaussians

OOM issue #24