Closed qzhang-cv closed 1 year ago
Another question: the running time of the code is different under the factor = 16 and the factor=1, I use the same num_rays, so the size of the dataset will not affect the running of the network.
Firstly We didn’t investigate multi GPUs, but I think it should just work out of box.
Secondly the Occupancy Grid is a major source of speedup. May I know what’s your use case that you have to get rid of it?
Thirdly as our logic is to skip empty and invisible regions, the training speed gradually goes up as your scene got cleaned up during training. So if you test it with 100 iters with a randomly initialization then you won’t be able to enjoy this advantage of nerfacc.
Lastly AFAIK, the pytorch nerf samples points using the near far plane with a constant number of samples. Are you comparing with nerfacc with the same #samples? You would need to set a proper render_step_size
to create the same number of samples using nerfacc. By default we sample roughly 1024 per ray.
For your second question, I would check if the dataloader is responsible for the runtime differences.
Secondly the Occupancy Grid is a major source of speedup. May I know what’s your use case that you have to get rid of it?
I do not use Occupancy Grid, because when I use the Occupancy Grid with default setting, I found the output of n_rendering_sasmples is zero, do you mean I could use the constant num_rays without updating the values?
Yeah you could use a constant num_rays
. The dynamic ray batch size is not that important.
The n_rendering_samples
could occasionally be zero if you are working on a synthetic data with a white / black background. If a batch of rays don't hit the object at all, this batch of rays would have zero samples. As shown in the example script, skipping this iteration is totally fine.
That's assuming everything (e.g., aabb, near, far, camera) are set up correctly. If something is not correctly setup it will also cause that.
The occupancy grid shouldn't be the cause of n_rendering_samples=0
as the occupancy grid is always "synced" with your network. You can sanity check it by printing out occ_grid.binary.float().mean()
to see the percentage of occupied voxels in the grid. If that is zero, then you might need to investigate why your network is outputting all-zeros, or you are not properly update the occupancy grid.
Yeah you could use a constant
num_rays
. The dynamic ray batch size is not that important.The
n_rendering_samples
could occasionally be zero if you are working on a synthetic data with a white / black background. If a batch of rays don't hit the object at all, this batch of rays would have zero samples. As shown in the example script, skipping this iteration is totally fine.That's assuming everything (e.g., aabb, near, far, camera) are set up correctly. If something is not correctly setup it will also cause that.
My question is that, when I use the Occupancy Grad, the n_rendering_samples is always zero, considering that my scene do not have a white / black background. My setting is
occupancy_grid = OccupancyGrid( roi_aabb=args.aabb, resolution=grid_resolution, contraction_type=contraction_type, ).to(device)
, my scene is similar to 360_v2, so I use the same parameters of 360_v2
In that case I would suggest you to check occ_grid.binary.float().mean()
.
If that is zero after you update it, that means your network is having all-zero outputs.
if that is not zero after you update it, I would check the camera and aabb etc for ray_marching, as it should not give you zero samples.
I successfully run the code with occupancy_grid~Thank you very much. I have another question: how to set the box size (my scene is similar to mip-nerf 360 dataset), and how to set the render_step_size?
The box is the region in the world space that you care about. In that region the space will not be contracted so the performance would be better that other regions. If you know about it you can set it, otherwise you can compute automatically from the camera locations and use that as a box (see auto_aabb in the script train_ngp_nerf.py).
The render_step_size is, the minimum ray marching step size in world space. Again it is better if you know about your scene scale and set it based on that. If you don’t know anything about the scene, I would recommend you to compute the auto_aabb first and divide the scale of the auto_aabb by, say 128 to get the render_step_size, which corresponds to roughly 128 samples in the box.
Note the render_step_size can be tuned to achieve trade off between performance and speed. The smaller it is, the more samples it draws, so the runtime would be slower but performance would be better
Closed as the problems seem to be all resolved. Feel free to reopen it if not.
I want to run train_mlp_nerf.py based on our datasets, whose resolution of the images is 2k. I have set the num_rays to a constant value of 1024 and set the occupancy_grid to None.
Now, I run the code under 3 A100s, but the running time of 100 iters of your code is slower than the origin PyTorch-nerf. I consider the reason may be the setting of multiple GPUs. I want to know how to run your code on multiple GPUs.