Running out of GPU memory

tobysharp commented 4 years ago

python train_nerf.py --config config/lego.yml

On a Windows machine with an nVidia GeForce 2080 Ti:

[TRAIN] Iter: 0 Loss: 0.23798935115337372 PSNR: 6.234424750392607
[VAL] =======> Iter: 0
  0%|                                                                                       | 0/200000 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "train_nerf.py", line 404, in <module>
    main()
  File "train_nerf.py", line 336, in main
    encode_direction_fn=encode_direction_fn,
  File "D:\dev\nerf\nerf\train_utils.py", line 180, in run_one_iter_of_nerf
    for batch in batches
  File "D:\dev\nerf\nerf\train_utils.py", line 180, in <listcomp>
    for batch in batches
  File "D:\dev\nerf\nerf\train_utils.py", line 115, in predict_and_render_radiance
    encode_direction_fn,
  File "D:\dev\nerf\nerf\train_utils.py", line 11, in run_network
    embedded = embed_fn(pts_flat)
  File "D:\dev\nerf\nerf\nerf_helpers.py", line 166, in <lambda>
    x, num_encoding_functions, include_input, log_sampling
  File "D:\dev\nerf\nerf\nerf_helpers.py", line 157, in positional_encoding
    return torch.cat(encoding, dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 0; 11.00 GiB total capacity; 4.49 GiB already allocated; 2.81 GiB free; 5.88 GiB reserved in total by PyTorch)

krrish94 commented 4 years ago

On a 11GB GPU, I'd recommend lowering the chunksize parameters (in the lego.yml config file to about 8192 (here and here). Also, I'd reduce the number of layers in the neural net to about 4 for a start.

holzers commented 4 years ago

It seems quite a bit of your GPU memory is already allocated. Have you tried nvidia-smi to see where it is allocated? Maybe check if you running another instance of python where you run some training or where GPU memory is allocated.

I am using a 1080 with only 8GB and haven't had any problems with default settings in the original nerf repo.

eshafeeqe commented 4 years ago

Hello, I come across the same problem, attaching the error text below.

Traceback (most recent call last):
  File "train_nerf.py", line 404, in <module>
    main()
  File "train_nerf.py", line 336, in main
    encode_direction_fn=encode_direction_fn,
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 180, in run_one_iter_of_nerf
    for batch in batches
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 180, in <listcomp>
    for batch in batches
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 115, in predict_and_render_radiance
    encode_direction_fn,
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/train_utils.py", line 11, in run_network
    embedded = embed_fn(pts_flat)
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/nerf_helpers.py", line 166, in <lambda>
    x, num_encoding_functions, include_input, log_sampling
  File "/media/aslab/QUT_2/Dev/nerf-pytorch/nerf/nerf_helpers.py", line 157, in positional_encoding
    return torch.cat(encoding, dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 0; 7.94 GiB total capacity; 4.49 GiB already allocated; 1.20 GiB free; 5.88 GiB reserved in total by PyTorch)

My nvidia-smi output

Wed Jun  3 12:24:55 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980M    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P8     8W /  N/A |    421MiB /  8126MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1092      G   /usr/lib/xorg/Xorg                           198MiB |
|    0      2125      G   compiz                                       108MiB |
|    0      2809      G   ...quest-channel-token=4477776435151191749   108MiB |
+-----------------------------------------------------------------------------+

eshafeeqe commented 4 years ago

I reduced the chunck size as recommended, its started working now. I am using 8GB graphics card (GTX 980).

krrish94 / nerf-pytorch

Running out of GPU memory #9