How to run on multiple GPUs?

qzhang-cv commented 1 year ago

I want to run train_mlp_nerf.py based on our datasets, whose resolution of the images is 2k. I have set the num_rays to a constant value of 1024 and set the occupancy_grid to None.

Now, I run the code under 3 A100s, but the running time of 100 iters of your code is slower than the origin PyTorch-nerf. I consider the reason may be the setting of multiple GPUs. I want to know how to run your code on multiple GPUs.

qzhang-cv commented 1 year ago

Another question: the running time of the code is different under the factor = 16 and the factor=1, I use the same num_rays, so the size of the dataset will not affect the running of the network.

liruilong940607 commented 1 year ago

Firstly We didn’t investigate multi GPUs, but I think it should just work out of box.

Secondly the Occupancy Grid is a major source of speedup. May I know what’s your use case that you have to get rid of it?

Thirdly as our logic is to skip empty and invisible regions, the training speed gradually goes up as your scene got cleaned up during training. So if you test it with 100 iters with a randomly initialization then you won’t be able to enjoy this advantage of nerfacc.

Lastly AFAIK, the pytorch nerf samples points using the near far plane with a constant number of samples. Are you comparing with nerfacc with the same #samples? You would need to set a proper render_step_size to create the same number of samples using nerfacc. By default we sample roughly 1024 per ray.

liruilong940607 commented 1 year ago

For your second question, I would check if the dataloader is responsible for the runtime differences.

qzhang-cv commented 1 year ago

Secondly the Occupancy Grid is a major source of speedup. May I know what’s your use case that you have to get rid of it?

I do not use Occupancy Grid, because when I use the Occupancy Grid with default setting, I found the output of n_rendering_sasmples is zero, do you mean I could use the constant num_rays without updating the values?

liruilong940607 commented 1 year ago

Yeah you could use a constant num_rays. The dynamic ray batch size is not that important.

The n_rendering_samples could occasionally be zero if you are working on a synthetic data with a white / black background. If a batch of rays don't hit the object at all, this batch of rays would have zero samples. As shown in the example script, skipping this iteration is totally fine.

That's assuming everything (e.g., aabb, near, far, camera) are set up correctly. If something is not correctly setup it will also cause that.

The occupancy grid shouldn't be the cause of n_rendering_samples=0 as the occupancy grid is always "synced" with your network. You can sanity check it by printing out occ_grid.binary.float().mean() to see the percentage of occupied voxels in the grid. If that is zero, then you might need to investigate why your network is outputting all-zeros, or you are not properly update the occupancy grid.

qzhang-cv commented 1 year ago

Yeah you could use a constant num_rays. The dynamic ray batch size is not that important.

The n_rendering_samples could occasionally be zero if you are working on a synthetic data with a white / black background. If a batch of rays don't hit the object at all, this batch of rays would have zero samples. As shown in the example script, skipping this iteration is totally fine.

That's assuming everything (e.g., aabb, near, far, camera) are set up correctly. If something is not correctly setup it will also cause that.

My question is that, when I use the Occupancy Grad, the n_rendering_samples is always zero, considering that my scene do not have a white / black background. My setting is occupancy_grid = OccupancyGrid( roi_aabb=args.aabb, resolution=grid_resolution, contraction_type=contraction_type, ).to(device), my scene is similar to 360_v2, so I use the same parameters of 360_v2

liruilong940607 commented 1 year ago

In that case I would suggest you to check occ_grid.binary.float().mean().

If that is zero after you update it, that means your network is having all-zero outputs.

if that is not zero after you update it, I would check the camera and aabb etc for ray_marching, as it should not give you zero samples.

qzhang-cv commented 1 year ago

I successfully run the code with occupancy_grid~Thank you very much. I have another question: how to set the box size (my scene is similar to mip-nerf 360 dataset), and how to set the render_step_size?

liruilong940607 commented 1 year ago

The box is the region in the world space that you care about. In that region the space will not be contracted so the performance would be better that other regions. If you know about it you can set it, otherwise you can compute automatically from the camera locations and use that as a box (see auto_aabb in the script train_ngp_nerf.py).

The render_step_size is, the minimum ray marching step size in world space. Again it is better if you know about your scene scale and set it based on that. If you don’t know anything about the scene, I would recommend you to compute the auto_aabb first and divide the scale of the auto_aabb by, say 128 to get the render_step_size, which corresponds to roughly 128 samples in the box.

liruilong940607 commented 1 year ago

Note the render_step_size can be tuned to achieve trade off between performance and speed. The smaller it is, the more samples it draws, so the runtime would be slower but performance would be better

liruilong940607 commented 1 year ago

Closed as the problems seem to be all resolved. Feel free to reopen it if not.

nerfstudio-project / nerfacc

How to run on multiple GPUs? #75