graphdeco-inria / hierarchical-3d-gaussians

Official implementation of the SIGGRAPH 2024 paper "A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets"
Other
857 stars 78 forks source link

CUDA out of memory #4

Open lambdald opened 1 month ago

lambdald commented 1 month ago

Hello, when I was running the small_city data, I encountered the following error. How can I solve it?

File "/data/workspace/hierarchical-3d-gaussians/train_coarse.py", line 94, in training
    render_pkg = render_coarse(viewpoint_cam, gaussians, pipe, background, indices = indices)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/hierarchical-3d-gaussians/gaussian_renderer/__init__.py", line 381, in render_coarse
    rendered_image, radii, _ = rasterizer(
                               ^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/diff_gaussian_rasterization/__init__.py", line 205, in forward
    return rasterize_gaussians(
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/diff_gaussian_rasterization/__init__.py", line 28, in rasterize_gaussians
    return _RasterizeGaussians.apply(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/diff_gaussian_rasterization/__init__.py", line 84, in forward
    num_rendered, color, radii, geomBuffer, binningBuffer, imgBuffer, invdepths = _C.rasterize_gaussians(*args)
                                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 184.32 GiB. GPU
White-Mask-230 commented 1 month ago

Hello, the problem is that your computer has not enough GPU to run the program the more easy solution is creating a github Codespacse of this repository and then execute the program like you will do it with Visual Studio Code.

For create the github code space you have to:

  1. Open this repository
  2. Press the key .
  3. Open the terminal like you will do it in Visual Studio Code in my case "Ctrl + ñ"
  4. In the terminal will give you the posibility to create a local clone or a Github Codespaces, you click the option to create a Github Codespaces

And that' s all.

Snosixtyboo commented 1 month ago

Hello, the problem is that your computer has not enough GPU to run the program the more easy solution is creating a github Codespacse of this repository and then execute the program like you will do it with Visual Studio Code.

For create the github code space you have to:

  1. Open this repository
  2. Press the key .
  3. Open the terminal like you will do it in Visual Studio Code in my case "Ctrl + ñ"
  4. In the terminal will give you the posibility to create a local clone or a Github Codespaces, you click the option to create a Github Codespaces

And that' s all.

Possible, but the message says that the code was trying to allocate 180 GB, which is a bit insane. So it looks like some sort of bug.

@lambdald If you want us to take a look please provide the full Dockerfile you are using. If you are not using Docker, it's gonna be hard to recreate your issue since it's likely setup-specific.

White-Mask-230 commented 1 month ago

True, I didn't see the end of the error.

I have the same error when I run small_city, it can be a bug of the program searching information I find this https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch. I' m unable to make it work

lambdald commented 1 month ago

Hello, the problem is that your computer has not enough GPU to run the program the more easy solution is creating a github Codespacse of this repository and then execute the program like you will do it with Visual Studio Code. For create the github code space you have to:

  1. Open this repository
  2. Press the key .
  3. Open the terminal like you will do it in Visual Studio Code in my case "Ctrl + ñ"
  4. In the terminal will give you the posibility to create a local clone or a Github Codespaces, you click the option to create a Github Codespaces

And that' s all.

Possible, but the message says that the code was trying to allocate 180 GB, which is a bit insane. So it looks like some sort of bug.

@lambdald If you want us to take a look please provide the full Dockerfile you are using. If you are not using Docker, it's gonna be hard to recreate your issue since it's likely setup-specific.

@Snosixtyboo Sorry, I use conda to manage my environment instead of Docker, and I setup the python environment following the readme.

SunHongyang10 commented 1 month ago

same bug L1VzZXJzL3N1bmhvbmd5YW5nL0xpYnJhcnkvQ29udGFpbmVycy81WlNMMkNKVTJULmNvbS5kaW5ndGFsay5tYWMvRGF0YS9MaWJyYXJ5L0NhY2hlcy81WlNMMkNKVTJULmNvbS5kaW5ndGFsay5tYWMvdGh1bWJuYWlscy80MUM2MkJFMC03QkFCLTQzNjctOEQxOS1FOUFDMDVERTFGMkQucG5n png

White-Mask-230 commented 1 month ago

Try this:

  1. Open the python console runing python3
  2. Import torch import torch
  3. Clean the cache torch.cuda.empty_cache()

Reference: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch

SunHongyang10 commented 1 month ago

Try this:

  1. Open the python console runing python3
  2. Import torch import torch
  3. Clean the cache torch.cuda.empty_cache()

Reference: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch

I tried, and I still meet the bug

White-Mask-230 commented 1 month ago

Run this code and tell us what it says

import torch

print(torch.cuda.memory_summary(device=None, abbreviated=False))
kevintsq commented 1 month ago

Same problem. OOM was encountered on a GPU with 80 GB memory but not encountered on a GPU with 8 GB memory, using the same dataset.

White-Mask-230 commented 1 month ago

@kevintsq Interesting tell us more about the two computers

kevintsq commented 1 month ago

The former is an HPC using Slurm on Linux, and the latter is a Windows laptop. I should have compiled the submodules according to the correct CUDA computing capabilities. I’ve tried CUDA 12.1, 12.3, 12.4, 12.5 + PyTorch 2.3, 2.4 on the HPC, but the problem persists (12.1 is illegal memory access). The laptop runs well on CUDA 12.4 + PyTorch 2.4.

LRLVEC commented 2 weeks ago

I met with the same issue on ubuntu 22.04, and fixed it by switching from cuda 12.5 to 12.2.