Closed salykova closed 1 year ago
@liruilong940607 is this related to the recent nerfacc bump to 0.5.2?
Hi @tancik @liruilong940607
Also I noticed that if I use instant-ngp
method (not instant-ngp-bounded
), training requires around 5GB of VRAM, then when the viewer is opened, allocated VRAM suddenly jumps up to 15-17GB. I would assume all nerfacc-based methods have this problem.
By the way, instant-ngp-bounded
method uses all images for training and the same images for evaluation, whereas nerfacto
splits the images into train and eval. sets. Should instant-ngp-bounded
also do the same?
Ah so I looked into it and this OOM is because of the cone_angle=0.004
-> cone_angle=0.0
in the PR #1809 for instant-ngp-bounded
. This leads to much more samples for instant-ngp-bounded
than before.
The reason for changing this default behavior is to match the NGP paper on how they train the bounded nerf-synthetic data.
For instant-ngp
model, I reverted to pre-#1809 and the behavior is the same (15GB with viewer opened).
But note that the ultimate reason for such high GPU consumption with NGP-based methods, is NGP-based methods densely sample along the ray (~1000 or more samples per ray) and gradually reduce the number of samples by pruning the space during training. So in the early stage of the training, rendering is slow and memory intense because of the large amount of samples. But as the training goes the situation would become much much better.
That being said, ideally we should have a control of dynamic number of rays for viewer, just like what we have for training (keep the number of samples roughly constant). But I'm not sure if it worth to implement that just to support NGP methods. A easy fix would be just to reduce viewer.num_rays_per_chunk
.
BTW, if you want to try out the NGP method, I would recommend using instant-ngp
model instead of instant-ngp-bounded
on real scenes like poster
even though they might be bounded (the bounding box you get for real scenes are ususally neither accurate nor tight so you don't want to rely on that too much). Also instant-ngp
by default set the multi-res level of the occupancy grid to 4 while instant-ngp-bounded
set it to 1, assuming we pay "uniform attention" to everywhere in the scene bbox.
@liruilong940607
Thanks for the explanation! I can confirm, that instant-ngp
uses less memory after occupancy grid is optimized. For example, after 5k steps memory consumption is 11GB instead of 15GB at the beginning. But I observe strange behavior: if I open the viewer after 5k steps and don't move the camera, memory consumption is 11GB. As soon as I change camera position (translate or rotate in the viewer), memory consumption jumps to 18GB. Do you maybe know why is that?
As we discussed, the GPU memory that NGP-based methods consume, is not a constant value. It not only depends on the training status but also depends on the view point. Because ultimately it is proportional to the number of samples you have to evaluate when rays travel through the scene. Changing the view point would change the traverse path of the rays so would lead to different number of samples based on the occupancy of the scene.
For example, if you move camera behind the wall then everywhere is occupied (because never got observed in training so never got trained), and NGP will give you dense samples.
@liruilong940607
It was also my assumption, but I was a little confused of such huge memory jump (from 11GB to 18GB) only due to view point.
@tancik
I will close the issue, if there is nothing further to discuss
I opened a PR to reduce the default viewer.num_rays_per_chunk
so that the GPU consumption in the beginning should be roughly 11 GB.
@liruilong940607 Thanks!
Describe the bug I use the latest version of nerfstudio (built from source). During training with
instant-ngp-bounded
VRAM consumption is normal (around 4GB). However if I use the viewer, more than 24GB of VRAM is needed, then I get CUDA out of memory error and the viewer crashes, but the training continues.To Reproduce Steps to reproduce the behavior:
ns-train instant-ngp-bounded --data data/nerfstudio/poster/
Screenshots